Measurement 

and 

Evaluation 
in the 
Schools 


Measurement 

and 

Evaluation 
in the 
Schools 

LOUIS J. KARMEL 

University of Kentucky 


The Macmihao Compaoi’ * Q>]liet-'MaC7nillan Limhed, London 



iQ Cop^Tight, LquU I. Kannel, t970 

AU rights feser\'ed- No part of this book may be reproduced or 
transmitted in any form or by any means, electronic or mechanical, 
including photocopying, recording or by any information storage and 
rctrick’al system, without permission in writing from the Publisher. 

First Printing 

A portion of this rnaterial has been adapted from Testing in Our Schools, 
by Louis J. Karmcl, published by The Macmillan Company, 1966. 
t5 Copyright 1966 by Louis J. Karmel. 

Library of Congress catalog card number: 73-80307 

’Fue Macmiixa-s Company 

Cou-ie»-Maomi.la.s Canada, Ltd., Toronto, Ontario 


Printed in tuc UNtnn States of America 



This book is dedicated with love to my little womeri 

Elizabehi Anne 
Catherine Lee 
Masy Patricia 



Preface 


This book is wiiten primariJy for teachers. It may also be used in intro- 
ductory measurement courses for guidance and psychology students. The 
essence of this book is to communicate measurement and evaluation concepts 
to the beginning student in simple and direct language. Measurement con- 
cepts that one needs to know in actual school practice are stressed. In attempt- 
ing to do this, accuracy and scholarship have not been neglected, but have 
been translated into readily understandable terms. 

The organizational plan is intended to increase, not to discourage, student 
interest. Historical and scientific information is integrated throughout the 
book in relevant sections. The first two chapters deal with questions upper- 
most in the minds of contemporary students. For example, \Vhy do we give 
tests? What about cultural bias? \Vhat is the relationship between IQ scores 
and innate ability? What do tests mean for disadvantaged students? Do our 
tests indicate that Negroes arc equal to or inferior to white students ? In 
the discussion of these and other questions historical and contemporary 
research and thinking are presented. An attempt is made to give candid 
answers. 

The rationale of presenting issues before actual exposure to the details of 
measurement is based on the premise that involvement through visceral and 
intellectual stimulation increases motivation and breaks down some of the 
hostility many students hasne toward testing. Thus, the first t\TO chapters 
attempt to capitalize on general testing interest and to present and clarify 
erroneous testing information propagated by test zealots and critics. 

After the first two chapters the reader is gradually introduced, in non- 
technicallanguage, to technical aspects of testing and to practical applications 
of measurement. Standardized testing is not relegated to a secondary position; 
a full and detailed explanation of concepts that the nontesting expert needs 
to know is presented. Many measurement textbooks that are geared pri- 
marily for teachers spend proportionately little time on this area and de\’ote 
themselves mainly to teacher-made tests. The need for teachers and other 
school personnel to understand standardized testing cannot be overempha- 
sized. Many statements critical of testing could be eliminated if the test con- 
sumer used tests more appropriately. Teachers, especially in the elementary 


PREFACE 

viii 

grades, deal with standardized testing and need to know the significance of 

these tests. _ . , . . t-x fx—.m 

In a recent investigation of the ro\c of the teacher m testing, Dr. Ua\id 
A. Goslin.i a staff sociologist at the Russell Sage Foundation, found that the 
average teacher is poorly prepared to use standardized test information. He 
found that this was especially true among elementary school teachers, who 
were most often responsible for the administration, interpretation, and 
use of standardized tests. Dr. Goslin emphasized the classroom teachers 
involvement with, and responsibility for, standardized testing and their 
lack of adequate training for this task. This book attempts to meet this need. 

Although standardized tests arc discussed at length, the classroom test is 
not neglected. Both the good and bad points of teacher-made tests arc dis- 
cussed in the context of the school setting. That is, realistic suggestions for 
the construction of various types of teacher-made tests arc presented. 

In the area of statistics the author has attempted to present this material 
based on his years of experience in the public schools and in teaching teachers 
at the university Icn'el. Teachers as a group do not understand mathematical 
concepts as easily as they do verbal terms. Therefore, the statistics section 
has been WTitten for the nonmathematical teacher. Only statistics that the 
teacher needs to know for classroom use and for an understanding of standard- 
ized tests ate piesented. Formulas and involved statistical manipulations 
have been kept to a minimum. 

A full chapter is devoted to college entrance examinations because of the 
contemporary importance of admission to college and obtaining a college 
degree. An attempt is made to convey the basic features of the two national 
college testing programs. In addition, research findings concerning how 
college entrance tests relate to college success, the place of entrance tests 
in admission procedures, and the value of preparation for the tests are 
reviewed. 

In keeping with the theme of involvement, additional readings are cited 
in the particular section to which they pertain rather than at the end of each 
chapter. The reference style conforms to the American Psychological Associa- 
tion’s format. That is, the name and date are given in the context of the 
discussion, for example, Jones (1968). The sources can be found by consulting 
the reference list at the end of each chapter. References are arranged in 
alphabetical order. 

^ In summary’, this book attempts to convey basic measurement and evalua- 
tion concepts in understandable language. Case studies and “living” language 
are used. ® ® ® 

1 want to thank my tvife, Marylin Odom Karmel, a former school teacher 
herself, for helping in this endeavor. Her professional advice, knowledge of 
the public schools, and knowledge of teachers and their problems were 
extremely useful m gearing the text to school usage. Her ability to say things 

' YlS! ^ Toting. New York: Russell Sage Foundation, 



PREFACE ix 

clearly and avoid the professorial syndrome of verbosity has been of consider- 
able value. 

I would also like to acknowledge my everlasting gratitude to Dr. Luther 
R. Taff of the University of North Carolina (Chapel Hill), who provided me 
rnth the opportunity for grouth both personally and professionally. And last 
but not least I would like to thank my students, whose questioixs, ideas, and 
approaches are reflected in this book. 


Lexington, Kenlueky 


L.J.K. 



Contents 


part ONEi Reasons for Testing and 
Contemporary Issues 


«"■ sss.’Ji.r;"'; 

■vvhy Do Teachers Use Tests. 


chapter 2 


Contemporary Issues and Problems 

Major Criticisms 

Problem Areas f Standardised Tests 

A Practical Apprua**'’*"' 


Chapter 


part TWO: Testing the Test 

3 yestEthles.Standards.andProcedures 

Some History 
Ethics 

standards Procedures 

Testing and Sco g p ,form3nce 
tractors Affecting Test re 


CONTENTS 

90 


Chapter 4 Validity and Reliability 
How to Evaluate a Test 
Validity 
Reliability 
Practical Concerns 


chapter 5 Statistics, Norms, and Standard Scores 
Some Statistics 
Ranking Scores 
Frequency Distribution 
Class Inter\‘al 
Graphic Representation 
Measures of Central Tendency 
Measures of Variability 
Norms 

Standard Scores 
Intelligence Quotients 
Practical Usage of Norms 


Chapter 6 Sources of Test Information 158 

Special Resources 
Texts and Reference Books 
Test Publishers 
Journals 


PART THREE: Individual Tests of 

Intelligence and Personality 


Chapter? Individual Tests of Intelligence 169 

Stanford-Binet Intelligence Scale 
Wechsler Scales 

Evaluation of the Binet and Wechsler 
Nonlanguage Tests 
Culture-Fair Tests 
Infant and Preschool Tests 



CONTENTS 

Chapter 8 Projective Techniques 
Rorschach 

Thematic Apperception Test 
Other Projective Story Techniques 
Other Types of Projective Techniques 
E^^aluation of Projeatt'e Techniques 


PART FOUR: Group Standardized Testing 


Chapter 9 Scholastic and Speaal Aptitude Tests 219 

Scholastic Aptitude Tests 
Special Aptitude Tests 
Vocational Aptitude Tests 
Graduate School Tests 
Using the Results of Aptitude Tests 


Chapter 10 Achievement Tests 256 

Construction of Achievement Tests 
Differences Between Teacher*Made and Standardized 
Tests 

Types of Achievement Tests 
Major Uses in the School 


Chapter 11 College Entrance Examinations 296 

College-Administered Tests 
College Entrance Examination Board 
American College Testing Program 
Research— A Brief Review of R««nt Studies 
The Practical Meaning of College Entrance Examinations 


Chapter 12 Personality, Attitude, and Interest Inventories 
Personality Inventories 
Attitude Inventories 
Interest Inventories 



xiv 

part FIVE: Teacher-Made Tests 
and Grades 

contents 

Chapter 13 Teacher-Made Tests 

Standardized Versus Teacher-Made Tests 
Purposes of Teacher-Made Tests 

Planning Ahead 

Defining Objectives 

Steps in Test Construction and Administration 
E%’aIuation of Your Test 

Tj’pes of Teacher-Made Tests 

Essay Versus Objective Tests 

Teachers, Tests, and Realitj’ 

371 

Chapter 14 The Essay Test 

Reliability 

Validity 

Ads-antages and Disadvantages of the Essay Test 
Construction of Essay Tests 

Grading Essay Tests 

A Case History 

386 

Chapter 15 The Objective Test 

Characteristics 

General Rules for Item Writing 

Types of Objective Items 

Mechanical Operations 

Item Analysis 

A Case History 

397 

Chapter 16 Grades and Report Cards 

Philosophy 

Purposes of Grades 

Assigning Test Grades 

Assigning Report Card Grades 

Report Cards 

Evaluation and Reality 

417 



CONTENTS 


PART SIX: A School Testing and 
Evaluation Program 

Chapter 17 Planning and Using a Testing Program 
Dctermiriing the Objectives 
\Vhen to Begin Planning 
Selecting Tests 

Suggestions for a School Testing Program 

Scheduling of Tests 

Test Orientation 

Recording Test Scores 

A Sample Testing Program 

Reporting Test Results to Students and Parents 

Appendix A Major Publishers of Standardized Tests 

Appendix B Glossary of Common Measurement Terms 

Appendix C Representative Tests 


461 

454 

471 


INDEX 


479 



Figures 


. . ^ Test I. Word Meaning, of the Stanford Achieve- 

S'-Telt, /’■ ^Xr'If the Pin-Pnnch Answer Pad of the 

’■ ".—J— S a— 

earned scores on tne 

6. Histogram 

7. Frequen'^ polyg nerccntaces of cases in 

rnralrn"esHondngs.andarddeviat,onsandperc 

,0. S^^ler diagram and standard score scales. 

-ttSt?r;:::.ashsonOigdS.ho.Xestof.heWea.er 
”• StuUi'gU Sm.e-^_, ,, a score, M.A., and IQ 

16 . Make A Picture J ij rjtgst. he Verbal Meaning Test. 

It. Designs of cSons for f Abilities Test. 

18. Sample items and n tIs Test V. ^ ^ , 

19. Selected sample .tern Kuhlmann-Pmch Tes'^ j, gehool 

20. sample f'™:p?e guestion. from Part I 

°‘d CohV Ab®y CofeP Ability 

22 Selected items fr ^ Utechantcal ^ P ppoklct 1, Form L. 

2^ Sample item f^lThtDlffercntiaW^t ^ 

24. Sample items f om Aputn 

25. Sample Items I 


nCURES 


xvUi 

26. Sample questions from Reading, Cooperative Primary Tests, Form 12A. 

27. Sample items from Sequential Tests of Educational Progress— Mathe- 
matics, Form 2A. 

28. Sample items from Stanford Achievement Test, Intermediate I Battery. 
Form \V. 

29. Example of record sheet used in the Buswcll-John Diagnostic Test for 
Fundamental Processes in Arithmetic. 

30. Service centers for the CEEB. 

31. Sample set of items from Thorndike Dimensions of Temperament. 

32. Sample MMPI computer report. 

33. Diagnostic profile for Survey of Study Habits and Attitudes. 

34. Average profiles of males and females on the Allport-Vernon-Lindzey 
Study of Values, 

35. Example of items from men’s SVIB booklet (Form T399). 

36. An example of a SVIB Profile. 

37. Instructions and Sample Problems from Kuder Preference Record Voca- 
tional. 

38. Profile leaflet for Kuder General Interest Sur\'ey. 

39. Sample report form for Kuder Occupational Interest Survey. 

40. MVII profile sheet. 

41. Test blueprint for a unit on How Our National Government Functions. 

42. Example of a multiple-choice item testing depth of knowledge. 

43. Chart showing the evolution of a testing program. 



Tables 


1. Expectancy Table Prepared from the Grid m Figure 5 

2. Relationship Behveen APT-N and APT-LU Scores and Grades in 
Science of 294 Seventh-Grade Students 

3. Standard Errors of Measurement for Given Values of Reliability Co- 
efficient and Standard Deviation 

4. Raw Scores and Ranks of Students on Two Forms of an Arithmetic 
Test 

5. Scores on Vocabulary Test for Ninth-Grade Students 

6. Frequency Distribution of Scores on a Vocabulary Test for Ninth- 
Graders 

7. Scores on an IQ Test 

8. Frequency Distribution of IQ Scores wth Scores Grouped by Class 
Intervals 

9. Average Incomes of Nine Wage Earners Using the Mean and Median 

10. Standard Deviations and Percentile Equivalents 

11. Scores of Ten Students on Two Tests 

12. Grade Placement at Time of Testing 

13. Raw Scores and TTieir Grade Equivalents for the X Test of Social 
Studies for Junior High Students 

14. Percentile Norms for Girls for the Differential Aptitude Test, Form L, 
Fall (first semester), Tenth Grade 

15. Percentile Norms for Boys for the Differential Aptitude Test, Form L, 
Fall (first semester), Tenth Grade 

16. Percentile Ranks of Unselected High School Seniors 

17. Percentile Ranks of College-Bound High School Seniors 

18. Percentile Ranks for College Freshmen Enrolled in Junior Colleges and 
Technical Schools 

19. Percentile Ranks for College Freshmen Enrolled in Doctoral-Granting 
Institutions 

20. Distribution of the 1937 Standardiraiion Group 

21. Intelligence Classifications 

22. Classification of Mental Defeaives According to IQ’s (\VAIS) 

23 Stanford Achievement Test— Subtests for Various Grades 


XU 


TABLES 


24. Description of TDOT Dimensions 

25. Advantages and Limitations of Standardized and Nonstandardizcd Tests 
of Achievement 

26. A Comparison of Essay and Objective Tests 

27. Part of a Teacher-Made Ans%Yer Sheet 



part one 


Reasons for Testing 
and Contemporary 
Issues 


CHAPTER 

1 

The Reasons 


A tft death,” states an irate 
•<l am beginning to f tt^nc^rbyS 

voices raise the issue of tcacn 

student learning. legitimate “""'^mer 'V™ ‘° '’'” 2 tlU 

These <l'-f“’“ 7' w manner. Thts Aap<« ^^,i„„si Chapter 2 rv.ll 
objective and ‘"'““"f'" rs to these and other q 

general basis for *0 . eoblems. •« the school situation wi 

deal with specific issu ^i: about testing > philosophy 

it rvery difficuU»E=n“!"“i; variables “ „eogVi' '““““"l 


4 


reasons for testing and contemporary issues 

Basically, schools me tests as edsaatsssnal tools to promote individualized 
instruction. Indiiidualized instruction is the major reason for t«tmg. It 
implies that the school’s basic duty to the child is to know him as an 
individual. Inherent in this is a recognition of the dignity and worth of the 
individual and his unique qualities. The basic premise for giving tests is the 
assumption that indhiduals differ and that education must be geared to 
these differences so that each person may develop his or her own unique 
potential. 

If we accept this assumption of individual differences then we must also 
make certain other assumptions. We must realize, for example, that some 
children are ready to read at four whereas others are not ready until they are 
six or eight. Some children find school a “snap,” and others find it a “drag.” 
Certain children find mathematics easy, and others are lost. The child who 
finds math dtfncuh may excel in English, whereas his mathematically 
superior peer may find English his most difficult subject. Some children 
may be academically superior in all subjects; others may be slow to learn in 
all areas. Some children may re\-eal a great deal of promise but obtain poor 
grades in school. Other children may have personal problems that interfere 
with academic success. 

We will now turn our attention to the nature of tests, how they 
help in identifying some of the children prenously mentioned, and how 
they can be used to indmdualize instruction. Before we begin our dis- 
cussion of the reasons for testing wc must first know the meaning of the 
term test. 


Webster’s Xtic Collegiate Dictionary defines the word test as “any critical 
examination or trial . . . means of trial . , , subjection to conditions that 
show the real character of a person or thing in a certain particular,” and 
that with which anjlhing is compared for proof of genuineness.” According 
to ^^cbste^ the use of test in education means, “Any series of questions or 
exercises or other means of measuring the skill, knowledge, intelligence, 
capacities or aptitudes of an individual or group.” 

Generally when we use the word test we mean to appraise something or 
someone. That is, to test is to e\-aluate or measure something or someone 
against given criteria in order to obtain data that reveals relationships between 
our subjc« and our frame of reference. For example, in our space program 
the vchtcles and men arc subject to simulated conditions of space before the 
attual launching. The)' arc being tested to ascertain their future performance 
in outer space. In a school setting the child is tested in order to discover the 
degree of learning that has taken or may take place. Mary failed her spelling 
test. Johnny got m on his history test. Billy’s test scores reveal a high 
potential for school success. Ruth’s test scores show average potential for 
success m se«ctar.al tas^. Marj- and Johnny’s past learnings are being 
rncasured; Billy and Ruth s future chances of success are being evaluated, 
that IS, tested. ** 



THE REASONS ^VE GIVE TESTS 

Teacher-Made Tests 


5 


In the classroom the teacher may approach testing in many different n-ays 
bhc may, for example, ask her students to ^\rite “in your own words the 
meaning of the Fifth Amendment of the United States Constitution.” Or she 
may say, “Discuss the implications of the Magna Charta on the United States 
political system.” Such (jucstions are generally referred to as essay tests. 

In addition to or instead of the essay examination the classroom teacher 
may use the objective test. The word objeeike encompasses many different 
types of tests. Objective, as used in testing, means that the scoring is not 
influenced by the opinion, knowledge, or skill of the person scoring the test; 
or whether or not the person taking the test and the person scoring the test 
“communicate.'' Thus, any test which has predetermined correct answers 
may be called objective. For example, in the two previously mentioned 
essay questions the correct ansu-er is subject to interpretation by the teacher. 
The scoring is, therefore, subjective. On the other hand, the objective test 
leaves little, if any, latitude for interpretation of answers because there is 
only one correct answer. 

The true-false question and the completion and multiple-choice item are 
e.xamples of the objective test. The following are Items illustrative of these 
tests. 


Trtie-False. The scoring of an essay test is subjective. T F 

Multiple Choice. In objective testing, the term objective refers to the 
method of 

1. identifying learning outcomes. 

2. selecting test content. 

3. presenting the problem. 

4. scoring the answers. 

Completion. According to the author of your text, schools use tests 
as educational tools to promote 


Why Do Teachers Use Tests? 

The teacher’s primary role in the classroom is to teach; whether he 
succeeds may be ascertained by the intellectual growth and development of 
his pupils. In order to gauge this progress the teacher must msmute evaluahvc 
techniques. These techniques include es.say and object,™ tests »"<•. 
procedures such as day-to-day dassroonj observations and teacher judgment 
Cd on profasional "eapetieL and intuWon. All of 
the teacher in evaluating pupil progrKs. ^ 

focused on the reasons for using formal evaluative instruments called tests. 



6 


reasons for testing and contemporary issues 
Certification of Pupil Achievement 

^^any educators are dissatisfied with our present system of grading. It is 
true that poorly constructed tests may make our present system even worse. 
It is also true, however, that students must be e^•aluated in some manner, 
\\hether it be by marks, letters of recommendation, or comments on academic 
levels of proficienc}-. The test if properly conceived and executed can be ot 
assistance in the verification of pupil progress. The teacher, for example, 
needs to know if pupils have learned enough in first grade to warrant promo- 
tion to second grade. The teacher needs to know about pupil achievements 
in order to certify these acaamplishmenis to other educational institutions 
and the world of work. 


Measuring Outcomes of Instruction 

Tests aid in determining the learning outcomes of classroom instruction. 
The teacher-made test is a reflection of what the individual teacher considers 
important. If a teacher-made test stresses concepts or understanding over 
factual recall, this is indicative of the teacher’s basic educational objective. 
The teacher can then c\-aluate the success or failure of classroom learning m 
relation to the test results. An analysis of student responses to the test can 
be helpful to the teacher in adjusting the le%’el and direction of classroom 
instruction. 


Incentive 

^\ ell-constructed tests which reflect classroom instruction can increase 
student learning by helping to develop study habits and directing intellectual 
energy toward the desired educational objectives. Test results can reveal 
areas of strength and weakness of individual students and act as moti\'3ting 
devices for future study. Of course, poorly constructed tests that are used 
as disciplinary instruments may have the opposite effect and discourage 
student incentive. 

Trachcr-madc tests arc an important aspect of classroom instruction and 
can increase or d^ease academic progress, depending on their quality and 
reles^ncc. A detailed discussion of s-arious tj*pcs of teacher-made tests and 
specific suggestions concerning their construction will be found in later 
chapters. 


Standardized Tests 

Mmi pmpk have had KKiic cxpcriracc u-ith a standardized test, in school, 
or the aimrf sentces or nhen applying for employment. In educational 
and ptyc-holopcal measurement, itmdardizalUm means a fiied or uniform 



7 

the reasons NVE give tests , j. j 

* A Ano of tests A standardized test 

procedure in the administration but always under the same 

may be j, always the same and the answers to the 

conditions. The allotte manner. 

questions are always of the early efforts of persons 

Standardized tests are the method of measurtng eh, Idrens 

student was , nation of response. nine- 

teenth century by th These "“/ aliens and asked to 

”T;"e:rAl^«.le"tswe.P|^^^^^^^ 


Fechner laid the gr 



8 REASONS FOR TESTING AND CONTEMPORARY ISSUES 

early psychologists dct-eloped-and indeed, the discovery that 
diflLnce in the perception of identical phenomena by different P“P 
are important. Their experimental methods provided a legacy of rareful 
attention to experimental methods, statistical techniques, and preasion m 
the standardization of testing instruments. . 

The authors and publishers of standardized tests attempt to evaluate their 
instruments by many of the same rigorous scientific techniques that were 
used by experimental psychologj’. - -l 

Another tvay of thinking about a standardized test is to equate it with a 
recipe in cooking. If you are planning to make a crepe suzette or a chocolate 
cake, you will need instructions that have proved successful. If you follow 
these instructions exactly, your efforts should be rewarding. You should be 
able to reproduce the flavor and texture of the original recipe. The elaborate 
kitchens of many of the leading flour companies have a laboratory atmosphere. 
The exact ingredients, procedures for mixing, and conditions of baking are 
repeated over and over again to assure you, the consumer, of an accurate 
recipe. In the same way, a standardized test is subjected to rigorous experi- 
mentation on different groups of people. The conditions of testing and 
the reading of directions and the scoring are always the same. The person 
who U being tested with the X Test of Mental Abilities in Lexington, 
Kentucky, will receive exactly the same directions, questions, and time to 
complete the examination as the person who is being tested with the same 
standardized test in New York Cit)' or London, England. Scoring of the test 
will be the same no matter where the test is administered. The correct 


answers arc fixed and no dcs-iation from this format because of geography, 
culture, or subjeaive opinions of the test administrator is permitted. 

If there are changes in the standardized directions then the test ceases 
to be standardized and its value is severely impaired or completely obviated. 
This IS similar to the changing of a recipe. If you put in three cups of sugar 
rather than one, the chocolate cake may be more to your liking than the 
original recipe, but it is no longer the same cake. This is also true of the 
standardized test. You may change the directions, time, or scoring and 
thereby make the test more appropriate for j'our group of students. \Miat you 
will have then is a classroom lest, maybe an excellent one, but not a standard- 
ized test that will allow you to compare your students with other children of 
similar age or grade throughout the countrj'. 

The chief ^■alue of the standardized test is to provide teachers and students 
with an objective educational yardstick that can measure abilities or achies'e- 
ment free of subjective error. Let us look at some common evaluation 
problems found in the classroom. 

Mar>- is a tcTi-yearK.ld in the fifth grade at Jones Elemcntarj' School. 
She IS \cTy neat in her appearance and work. Her teachers have alwaj-s found 
her to l>c a ‘‘doll” and more than wiUmg to help with projects in the class- 
room. Mir>-’s grades have alwaj-s been excellent. I^st fall Mary and her 
classmates were given a battcrj' of standardized tests. Mary’s teacher was 



THE REASONS WE GIVE TESTS p 

surprised to find that Mary was only of average ability and achievement 
according to the test results. 

Peter, who is also ten years old and in the same class as Mary, is quite a 
different person. Peter has done poorly in school, barely passing each year. 
His appearance and written work are very sloppy. When his teacher asks him 
to help with projects he is quite reluctant and sullen in responding. Peter’s 
teacher was also surprised at Peter’s test scores. The standardized tests 
revealed that Peter was of superior ability and above average in achievement. 

Mary and Peter present us with the classical cases of possible teacher 
bias. Mary is a "mode!” child and the teacher likes her. Peter is sullen and 
difficult to teach; furthermore, he is untidy in appearance and work. Is it 
possible that the teacher's persona! feelings have entered into the evaluation 
of their classroom work? 

Mary and Peter’s teacher, being human, cannot help being subjective in 
the grading of her students. The standardized test assists her by presenting 
data free of subjectivity. Mary and Peter may now be viewed from another 
frame of reference. The teacher can weigh her subjective feelings wth the 
objective evidence and make appropriate teaching decisions. 

It should be noted in our example that no definitive suggestions were made 
as to the new course of action open to Mary and Peter’s teacher. It is still up 
to her to make the final educational decisions. No claim is made or intended 
for the infallibility of the standardized test over teacher evaluation. The 
standardized test is only an educational tool to be used along with other edu- 
cational techniques. It provides another aid in helping the teacher and student 
make sound educational decisions. 

Let us return for a moment to our example of cooking. In our discussion 
of recipes and their sdentific standardization we must not lose sight of two 
factors. One is that the application of experimental procedures is as new in 
cooking as it is in educational and psychological measurement. Old recipes 
that our grandmothers used were as subjective as oral examinations in school. 
For example, if you read old cookbooks you might find this direction in a 
recipe: "Add enough flour to make a stiff dough.” How much skill in cooking 
did our grandmothers need to interpret "enough” and "stiff dough”? This 
is not to state that our grandmothers couftf not bake or that tfieir results were 
not as good or better than those of our present oioks. The basic point is that 
grandmother had intuitively or subjectively to interpret recipes, whereas 
present-day cooks may follow exact directions of procedure and measurement 
of ingredients. 

Many of grandmother’s teachers were able to match the quality and re- 
liability of many of our present standardized tests. But, as in cooking, one 
doubts that they uere able to produce their results consistently without 
subjective error. 

The second factor to be noted is that, of course, a standardized test is 
not as reliable as a recipe produced in one of our modern experimental 
kitchens The human variables involved make this impossible. However, 



10 REASONS FOR TESTING AND CONTEMPORARY ISSUES 

stendardized tests generally appraise human behavior and capacity in a 
more accurate manner than subjective devices, such as teacher s evaluation. 

In essence, then, the standardized test is a scientific instrument that has 
been exposed to rigorous experimental controls. Its main features are 
uniformity of administration and scoring. It consists of questions that are 
factual in the sense that there is an agreed correct answer. Each test is subjec- 
ted to careful investigation by a preliminary administration, and questions 
that have been found to be poor are eliminated. The actual construction and 
criteria that must be met in producing a standardized test will be discussed 


in ensuing chapters. 

The reader should remember that standardized tests are tiof jree of error. 
They are not substitutes for teacher evaluation. The teacher’s own tests, 
as well as ratings on class projects and daily classroom performances, are as 
important as they ever were. Nevertheless, they add to her educational 
arsenal of teaching devices and enhance her effectiveness as a teacher. 


NVhy Do Schools Use Standardized Tests? 

The reader has now been exposed to the differences between teacher-made 
and standardized tests. A general overview of the reasons for testing has 
been presented. Let us now explore some of the specific reasons for standard- 
ized test usage in the school. 

Grouping 

The school may use the information provided by standardized tests to 
group pupils within a particular classroom or to form classroom groups. 
Thus, tests can help the teacher handle individual differences in students by 
indicating which children have the same or similar level of skill in a particular 
subject. Students in education and psychology are constantly exposed to the 
concept of individual differences. The arrangement of students within the 
classroom or the assigning of students to certain classroom groups according 
to ability and skill is in line with our awareness of individual differences. 

In the classroom the teacher observes and experiences individual differences 
among her students. Teachers arc often frustrated by their inability to gi'’® 
proper instruaion to the bright child and the slow student at the same time. 
If the teacher is able to use an objective instrument, such as an intelligei'^® 
test, along with subjective criteria, he can group children within a class 
according to their indb-idual ability and gear his teaching accordingly. 

If a school s philosophy is in agreement with homogenous grouping of 
classes, then standardized tests can help in arranging classroom groups. For 
cample, the knowledge gained from a mathematics aptitude test may assist 
the junior or senior high school suff in placing some students in algebra and 
others tn general mathematics. 



THE REASONS WE GIVE TESTS 


Let tis, for illustrative purposes, look at two students, Robert and Mike 
who are to begin ninth grade next fall. Boben and Mite have taken a battery' 
of Standardized tests \Yhich wJI be used along with other data in their class- 
room assignment at Glen High School. Glen High School groups its pupils 
mto three categories— slow, average, and advanced— for instruction in 
tnglish, foreign languages, mathematics, and science, Robert's tests revealed 
high general ability, high achievement in English, high aptitude for foreign 
languages, but only average achiewment and ability in science and mathc- 
matics. Mike’s scores showed average gcneral.ability, poor English achieve- 
ment with little chance of success in foreign languages, and poor skills and 
ability in mathematics and science. 


Robert is advised to enroll in the "advanced” foreign language class 
(choice of specihc language is his oivn decision), the “advanced" English 
class, the "average" algebra class, and the “average" science class. 

Mike is advised to wait until the tenth grade before beginning his 
language studies. The school also advises the "slow” English class along 
with general mathematics and the "slow” science course, 

The school’s recommendations for classroom placement are based on a 
variety of factors; these include school record, teacher recommendations, 
and results of standardized testing. Tests add the objective dimension to 
the important decision of where to place pupils. 

The reader should note two important facts. (I) Children should never be 
assigned to classroom groups or within a class solely on the basis of tests. 
The test adds an objective dimension to intelligent planning. It should not 
be used in lieu of teacher opinion but only as another measure. The combina- 
tion of teacher opinion and standardized test results gives us the best basis 
for intelligent decisions. Either one used alone decreases the probability of 
sound educational planning. (2) It should also be slated that even if the class 
has been grouped according to ability, there will still be a wide range of 
talent within the class. For example, students placed in the "average" 
algebra class will probably present abilities and achievcnJcnls ranging from 
barely average to very high average. The barely average pupil will probably 
have difficulty in mastering the material, whereas the vcrj’highaveragcstudcnl 
may find some of it very boring. Yet both are in the average group. To think 
that because of grouping we have completely eliminated the problem of vary- 
ing abilities is to misunderstand the process. Grouping docs cut down the 
spread of talent; it does not, howe»’er. eliminate it. 


Specia! Study and Remedial Instruction 

It is many times, very difficult for a teacher to distinguish between general 
low intellectual ability and a specialized problem in a particular skill, such as 
readme If a teacher finds that a student is doing poorly m mathematics, he 
iWnt the student Is not apobicof the wwrk. If, howcer, he h« 
ewLnee that the child is capable, then he may concentrate on the subject 


12 


REASONS FOR TESTING AND CONTEMPORARY ISSUES 


matter and the specific difficulty within it. If, on the other hand, the evidence 
reveals limited ability, instruction may proceed according to the child’s 
ability and the teacher may not expect as much as he would from a more 
gifted youngster. If the teacher suspects a reading deficiency he may want to 
utilize a reading test to ascertain the degree of deficiency. Materials and 
instruction may then be provided at the child’s reading level. The utilization 
of tests that measure both achievement and ability assist In selecting from 
within the classroom those students who have a remediable deficit. 


Evaluating Capability and Accomplishment 

Some students who seem to lack intellectual ability because of poor grades 
may actually be of superior intelligence, but because of certain family problems 
and/or emotional conflicts may not be achieving up to their capacities. Tests 
ran assist in providing an objective measure of capability without being 
indebted to classroom achievement or subjective appraisal. 

Some examplB of the use of tests in evaluating discrepancies between 
potential and achevement are found in every school system. As a school 
psychologist in the public schools, the author had occasion many times to 
n ’’ ’’iTo" ■■''““'■““i’' through the use of standardized 

n?nth T youngster. Donald S. was a fiftcen-year-old 

h rother and barely passing in 

Don m f 1 ° counselor suggested placing 

“ Il’g'ebt:'-'^’""" -hstltutingfeneral ^mathe^ 

topToe/«nt"o?hi?°"’'‘‘'' Donald was in the 

B^semes revelled That is, Donald's intelligence 

ageas“re“.^^^^^ ’’ P" cent of children his 

with the ob/e'l^ivrBwfuadotforDonlld^^^ °^i “hility. Armed 

appropriate measures. The autho” tvoArf wi ^ S,„lTd .Td h“' 'T'T 
over a year. In the rniincei;T,« and his teachers for 

and seemed to be the cause^of 

Donald was slow burbrhis se„ “ progress with 

the author heard from Donald he was a sooh™ olasses. The last 

If standardized tests had not been used at Do°"|h" “'''8= ‘‘"‘1 doing well. 

More importantly, Donald i..; have nZr rS;i‘'^d°Lis“ mf: 

Academic Reality 

stu^’ “^:;*:,:7^t^her to^P--^ Jhe 



the reasons we give tests 

3 .guiered nu.se. 

pursue a career in ^ ^ ^ a career in seeretanal or office 

is talented in office skills. Should ? . ^jgciding which courses to take 

Charles is an ^ biology. His^idonee 

in ninth grade. He wants to take olpbra. ctandardiaed 

counselor notes that his BO"''''* “ and language aptitude and science 

made. 


Educalional and Vacalwnal Goals "Should I go info a 

Lked by r“''P''!L ,0 help answer these ^““''"“.cicgc can be greatly 

American College J«'™f, 3^nd in the choice 0 ^^^^^ i,sc f 

r:-aS&.UM--rkind 01 education n 

Fo"dr‘‘"; ; n„d sums m olasstoom — 

Even with the bes me ^ ‘‘>>‘'‘' 5 ',”. 1 one school. Schools vwry- 

nnd often d'e'°;"‘'”j comparison is '“’"'‘f 'js end range of students 
the -t’t '”t'"un^ed States in -h* f ^t„t stands in any subject or 
throughout .'be determine how a compared. 

tffiru.J-otfrnowsthe^;P»^^^^^ 

S?ligsssa=:2 

standardised a,mmde^--Tin.eres.s. James has 
. desired m measure, vers 



j4 REASONS FOB TESTING AND CONTEMPORAKY ISSUES 

j tn po to colleee. He feels his abilities and interests lie in a 

m'echanical occupation. His aptitude tests show good eye-hand coordination 
and finger dexterity. His scholastic aptitude test scores are about • 

The interest inventory reveals interest in mechanics and science. His "'<““'5 
and plans correspond to the objective test data. James is, therefore, probably 
pursuing appropriate goals. 


Transfer Students 

Many schools, of course, may assign transfer students on the b^ls of 
previous school achievement, grade, and age. This is, however, a difficult 
procedure in some instances. Schools differ as has been previously mentioned 
and a child of ten in fifth grade at X school should not necessarily be placed 
in fifth grade in Y school. Standardized tests may aid the school in placing 
the new student in the grade and group appropriate to his needs and level of 
achievement. It also should be stated that some schools are remiss in keeping 
sound records and/or forwarding them in time for placement. Thus a school 
may find it necessary to make an evaluation of a new student for placement 
with little or no record of his former achievement. 


Discovering Educationally and Soeiatty Maladjusted Children 

In every school there are some students who present severe problems of 
educational or social adjustment. Among some of these tj’pes are the with- 
drawn, the unhappy, the mentally retarded, and others who are not adjusting 
to the pattern of the school. The standardized test renders assistance to 
teachers and counselors in their attempts at understanding and helping these 
children. The following case is only one among many different possible uses 
of tests in this area. 

Albert is a seven-year -old boy in the second grade. His teachers report 
that he is an isolate without any school friends. During recess he is usually 
found sitting alone on a swing or on the school steps. His academic progress 
is poor. Albert never creates any classroom problems. He usually sits in the 
back of the room with a stoical expression on his face. No matter what the 
teacher does, Albert reveals little emotion. Albert's teacher is not sure she 
will be able to promote him to third grade. Is Albert mentally ill? Is he 
just passing through a stage r" Is he mentally retarded ? Or is he an eccentric 
individualist? 

The school psychologist is asked to evaluate Albert and make appropriate 
recommendations. A good portion of the school psychologist’s evaluation will 
rest on the basis of special personality and intelligence testing. The test 
results will produce answers to the school’s questions about Albert. They 
may then msiitute appropriate actions which wiU help Albert realize his 
maximum potential for academic and personal actualization. 



15 


the reasons we give tests 

Research Uses 

.•The individual child is our 

voiced by almost i 3 ^fleeted in measurement devices 

philosophical orientation. directed at this objectm. That is, 

and goals. Our discussion thus f r has^^ school individua ize 

the direct measurement of factors ^i^ance for each individual pupil, 
instruction and provide educa ^ educational environment 

The use of measurement ' . mediately affect individual students, 

for future curricular chants ‘'°? ".“ ; je 5 ig„ed to understand and develop 

Research into all areas of ,^„deM i„ a mote effective manna, 

new approaches to serve eac immediate significance but can seive 

Often these investigations „eed to be fitted together 

as a piece or pieces of us briefly review some examples 

in order to serve our student tt 

of educational research ^a' ^ sot-year longitudinal study in 

SiglSsHlsfes; 

The implicit*®’'® ^ g^jpg was of paramo „ groups, 

are self-evident. Stantod-fJ'^tcemin a*-'TlrfctptimTe a'nd 

it served as objee''™ ivadiness ^ ° id swdies, and other 

Instruments such a 'sisal of student ability and 

special achtevement t«« appraisal 

areas of instruc i P school social studies 

progress. . . • *i,e effects of an cleme found that 

Tiba(1966) invest.pmd*_^^„d„pabsm^^^ to make 

curriculum which a« P ^j^Ujd in the abih V At the same 

students and to apply standardised achtevement 

't"mT*=se'atudents wricular objert^^^^ pupils bu, 

of direct importance for futu^^ j^^ jentsoH^ yp^^^^^^ .; to 

present future genera 


16 


REASONS FOR TESTING AND CONTEMPORARY ISSUES 

McGuire (1968) reviews the literature in this area covering aptitud^ 
achievement, attitude, interest, and personality tests as well as research 
into new types of tests and follow-up studies. His conclusions are a challenge 
to future investigators: “As of this date it appears that at the level of pro- 
fessional education much effort has gone into the development and frag- 
mented study of predictors of success v^thout comparable attention either 
to the construction and validation of reliable criterion measures or to the 
investigation of environmental influences on success. The challenge is 
clearl” (p. 58). 

The utilization of standardized tests in the area of school curricula should 
proceed with the utmost caution. In evaluating curriculum and/or curricular 
experiments the school should keep in mind the objectives of the school and 
that of the tests. For e.xample, if a program in modem mathematics is 
instituted a test of modern mathematics should be used to evaluate progress — 
not a test used to measure competence in traditional mathematics. 

If tests are used with intelligent application to educational objectives they 
can serve as valuable guide posts m measuring educational outcomes. It 
should also be noted that in any curriculum, modem or traditional, there are 
common basic skills that are necessary. 

Some other areas of research that utilize standardized testing are: evalua- 
tion of school children in terms of psychological and educational development 
and groivth; the psychological growth and educational development of 
retarded and gifted youngsters; and, of course, continual evaluation and 
testing of the instruments used to assess the skills, abilities, and other topics 
presented in this chapter. 

In essence, then, standardized testing is a valuable tool in research en- 
deavors that attempt to provide answers to educational questions. Testing 
by its very nature is wedded to research and an on-going appraisal of what 
is and what may be. 


Areas Where Standardized Tests Should Not Be Used 

The Trader should be aware that the "red light" areas of testing which 
wiU be dis^ssed are not agreed upon by all authorities. It is safe to state, 
hmy^'er, that most would at least consider them "yellow light” spheres 
which are to be cautiously considered. 


Assigning Grades 


standardized test results as evaluative con- 
alSv" Standardized test content is general and not 

spccincally related to the local school curriculum. 



17 


THE REASONS WE GIVE TESTS 
Evalmsing the School 


It IS true that standardized tests may help in assessing schools and classes 
in terns of aptitude and background characteristics, for they may function 
as a be^on to focus on strengths and weaknesses of specific schools and 
classes. 1 his is a dangerous procedure, however, for there is a strong tendency 
of centra! administrations to use the results in a judgmental or punitive 
manner. This may promote the "teaching for tests” syndrome, which is 
undesirable. It is undesirable because the local school system loses its 
autonomy in developing its own educational objectives. The development of 
curricula is then turned over to testing people. Teaching for tests also tends 
to be superficial, emphasizing "correct answers” rather than understandings. 
As previously stated, schools in particular geographic areas with particular 
students have particular needs and need to develop their own particular 
curricula. National standards cannot and should not be educational yard- 
sticks for all school systems. This is not lo say that there are not underlying 
common denominators in the basic skills, or that national standards are 
meaningless. They do provide us with a common frame of reference that is 
necessary in guiding individual students in educational and vocational goals, 
particularly when they will be competing with students from a broad geo- 
graphical area. They do, however, Jose much of their significance when 
we use them to judge the local school rather than to aid In individual 
guidance. 

To illustrate, let us look at two very different schools. School A is situated 
in an urban setting with the majority of students from a disadvantaged back- 
ground. The educational objectives of this school are to enrich the im- 
poverished lives of its students and to teach basic methods of living in a 
tw’entieth-century urban complex. In addition, this school also teaches the 
classical academic subjects and skills. To assess its educational worth by the 
use of a nationally standardized achievement test is to measure only one 
aspect of its total program. Certainly the standardized test is applicable as 
individual guidance for the students of School A. For example, those students 
who desire to go on to college must know their chances of success based on 
national standards rather than on the local norms, where competition may 
not be as difficult. 


Let us now turn our attention to School B, which is the e.vact opposite of 
School A. It is situated in an upper-middle-class suburban community. The 
majority of the students are above average in inlcHigenre and achie;'emen^ 
To judge the worth of the whole school on the basis of nationally standardized 
tests would be to inflate unrealistically the school’s accomplishments. One 
would expect School B to be above the national standards, given its setting 
and student body. This school must develop its own internal methods of 
judging its particular curriculum. Of course, 't must be stated that for 
individual guidance the national test helps the mdmdual student gatn a 
realistic pimire of how he compares to other students throughout the 


18 


• MASONS FOR TESTING AND CONTEMPORARY ISSUES 

country. He may be only average at School B but ““‘‘“'I 

scale. This would have special meaning in planning educational a 
tional objectives. It would not, however, give the central administration 
meaningful data to assess its whole school system or curriculum. 


Teacher Evaluation 

School administrators should never use the results of standardized tests to 
evaluate the competence of individual teachers. This procedure is wrong on 
several counts. First, it does not take into account the fact that class achieve- 
ment is, in part, the result of previous educational history. It is not edu- 
cationally valid or fair to judge a teacher who has taught a class for a semester 
or year as solely responsible for the student’s academic performance. Secondly, 
innate intelligence and family cultural experiences contribute greatly to 
educational achievement. Thirdly, as has been mentioned previously, an 
achievement test measures only a small portion of the objectives of most 
schools. Fourthly, it is likely to cause teachers to teach for tests rather than 
pursue the local educational and curriculum goals. Teachers, being human, 
will tend to neglect areas of the curriculum that are not conducive to measure- 
ment — such areas as, for example, citizenship, discussion, and verbal ability. 
Skills and learning that are easily testable and found in many standardized 
achievement tests often become the focal point of teaching. 

Evaluating teachers on the basis of test results not only blunts the horizons 
of teaching effectiveness, but demoralizes teachers. Teachers react to 
evaluative pressure in many ways. Some teach for the tests by securing old 
tests and concentrating on areas that are covered. Others “prepare” their 
students by' drilling them on the exact questions or similar questions that 
are to be presented €W1 .ethers are so pressured that they give the exact 
questions.and answers that are to be ask^ and require students to memorize 
them. Though these -procedures cannot be condoned, they are certainly 
understandable when the admimstration- plac^ the burden of teacher 
ev’aluation on class test scores. 

Finally, it should be mentioned that not only is this kind of evaluation 
WTong for teachers and their students, but it leads to poor testing practice. 
The value of the test is lessened, if not destroyed, by excessive evaluative 
emphasis. The basic purpose of testing is to render educational assistance 
to the student, the teacher, and the school. 

In summary, it can be stated that tests are useful educational instruments 
m helping children develop and realize their own individual potential. 
Standardized tests should be used as objective evidence to supplement the 
teacher’s subjective judgments. They are only one of many educational 
in helping children. They are not substitutes for teacher evaluation. 
Ihe teacher s own tests, ratings on cIks projects, and a child’s daily class- 
room j^rformance are as important as they ever were. Standardized tests used 
properly can be of immense assistance to the teacher and to his pupils. 



THE REASONS ^VE GIVE TESTS 

References 


19 


McGuire, C. H. Testing in professional education. Review of Educational Research : 

Educational and Psychological Testing, 1968, 38, 49-60. 

McKee, P,, and Braeinski, J. E. The effectiveness of teaching reading in kindergarten. 
U.S. Department of Health, Education, and Welfare, Office of Education, 
Cooperative Research Project No. 5-0371. Denver: Denver Public Schools and 
Colorado State Department of Education, 1966. 

Taba, H. Teaching strategies and cognitive functioning in elementary school children. 
U.S. Department of Health, Education, and Welfare, Office of Education, 
Cooperative Research Proje« No. 2404. San Francisco; San Francisco State 
College, February, 1966. 



I! 


CHAPTER 

2 

Contemporary Issues 
and Problems 


Major Criticisms 

The Clitics of-testing haye^ voiced their positions loud and long. Psychol- 
ogists, educatore, 'sociologists, and scholars from other disciplines as wcW 
as lay people have been critical of testing. Many of the criticisms have been 
characterized by half-truths, persona] bias, and misunderstanding of testing 
principles and objectives. The criticisms fall roughly into four categories- — 
those that deal with the effect of testing on the individual, the effect of testing 
on institutions, the effect of testing on society in general, and test items per 
se. Let us look at some of these criticisms and see what the charges are. 

Testing and the Individual 

There are se\ eral charges against testing that deal with the effect of testing 
on the individual. The first of thesechargesthat we shall consider is that tests 
damage an indiridual's self-esteem. 

Da-maced Self-esteem 

The critics of testing feel that testing may predetermine an individual’s 
social sutus and harm his image of himself. They state that testing places 
20 


CONTEMPORARY ISSUES AND PROBLEMS 

Why should I try border? to a person’s self-esteem 

or measuremettl instrument at -j-idav’s intelligence tests measure many 

permanent general aMily ,|,/i„fl„et.ce of an individual s cultural 
Ipects of human behavior, ■"='“‘*'"8 learn and do one s b^t 

background, school achieveme . intelligence test is a measur 

prycWog;itr*->"'^»”'"';;f'’'“,^f,J,3, foior. tlmt an individuals past 

smmm 


and social molds, then t rin the other hand, it 

that competent Psy^^^f ' ,,3. users are not psyoh^ J therefore, 
ever, he will then '“^'"Xstand the "> ‘“'“P"' 

^cl-Sc^t-ra^" 

docs more harm than g q„„nng 

diets. Yet we know tna 



22 REASONS FOR TESTING AND CONTEMPORARY ISSUES 

have severe gastrointestinal problems as a result. No one seriously propose 
banning milk because some people have allergies to it or do not consum 

it in proper quantities. . . , i 

The student of measurement should not take a position that jucc u 
the possibility of harm to a child because of testing, but he should be aware 
that testing is only one phase of the educational process and that it it is usea 
properly it may be of great value to most children. 


Limited Evaluation 

The criticism of limited evaluation of tests recognizes that a human being 
is a complex organism consisting of many different aspects of talent, person- 
ality, and motivation. The charge is that the use of one type of test, usually 
the kind dealing with verbal and quantitative skills, taps only a small portion 
of the abilities of most individuals. The implication is that this is the only 
measure used by schools and colleges in sorting students and that therefore 
much talent is lost to the society. 

Modem educators are quite aware of the fact that many students have 
talents that are neither verbal nor quantitative, and contemporary education 
thus envisions more than “reading and Mthmetic.” 

The answer to this test criticism again lies in the intelligent utilization 
of tests. A good evaluative program uses many measures in the evaluation of 
an individual. In the area of college admissions, for example, the college that 
has an admission or scholarship award policy based only on verbal and 
quantitative test scores is using testing improperly. The publishers of these 
tests have stated repeatedly that scores on these tests should not represent 
the sole consideration for admission or scholarship awards. 

The proper procedure is the utilization of many different kinds of measures. 
In a complete evaluation of an applicant the college admission officer would 
Vr’ant to know a candidate’s past test scores in educational areas other than 
the linguistic and mathematical. He would want to know about the candidate’s 
school record in terms of grades, courses, extracurricular activities, and 
teacher ratings before making his final decision. In this way the verbal- 
quantitative test is only one measure among many criteria in the selecting 
process. This principle applies, of course, at all levels of educational 
evaluation. 


Abridgement of Human Choice 

The third criticism of tests dealing with the effect on the individual is 
of a philosophical nature, and reflects our general fear that machines may rule 
man. Test! reduce people to numbers, state the critics of testing. They go on 
to watn that the use of tests may reduce human choice and action. This is a 
serious charge. Human freedom of choice is essential to our form of govern- 
ment and undethes most, if not all, of our educational philosophic. Any 
procedure or instrument that n-ould take this precious right away is a danger 



CONTEMPOBAHY ISSUES AND PROBLEMS 


to our society Do tests constitute such a danger? Are they in fact a menace 
to human freedom of choice? 

Let us look at the situation as objectively as possible. If we do this we 
must admit that tests could possibly pose such a menace if used improperly. 
Of course, most things if used improperly could endanger freedom. The 
United States government, hr example, has data on each of us (hat could 6e 
used by the wong people in a punitive and dictatorial manner. Yet this data 
is nece^ary to run the government and provide services to the whole society. 
There is no doubt that this data could provide the means to abridge freedom, 
as could some offices of government. Tests are no different, used improperly, 
without proper respect for the individual and knowledge of the inherent 
limitations of tests, grievous wrongs, such as the limitation of freedom of 
choice, could occur. 

To some the use of mental testa implies that human behavior is as measur- 
able as the dimensions of a table top. Given this frame of reference, the task 
of the test author would be relatively simple- Tests could be developed 
around finite goals and the factors that contribute to these objectives could be 
isolated and then measured. When factor X was mixed \vith factor Y the 
outcomewould always beXY, Human behavior could be predicted, measured, 
and controlled with the same certainty and facility that one finds inherent 
in chemical solutions and reactions. 

If the preceding u-ere true, that is, if human behavior were as measurable 
as a table top, then the danger of eliminating human choice would be great 
indeed. These things are, of course, not true. Our testing instruments are 
far from perfect. Psychologists cannot isolate all the variables that contribute 
and give direction to human actions. 

Under these circumstances the problem of testers playing God or of 
impersonal evaluation becomes quite remote. Of course it must be recognised 
that the improper education of test users could contribute to the irnproper 
use of measurement instruments. If the tester docs not know the limitations 
of his instrument and is afflicted ivith megalomania, he may indeed act out 
the role of a deity and assign youngsters to given educational slots on the 
basts of test results. 

Again, the answer to this critidsm is the proper education of the test user 


and consumer. e. ^ 

Psychologists and educators are painfully aware of the limitations of tests. 
They know and appreciate the fact that a great proportion of decision m*'"? 
musi be made in the maze of uncertainties, desired ontcomes, “<l 
of achieving these ontcomes. They also knot, that mistakes ivill be made. 


Testing and Institutions 

..^h= t: 

crl'Ieran almost paranoid quality and aottiety to the eritieal vendettas 



24 


SEASON'S FOR TESTING AND CONTEMPORARY ISSUES 

hurled at testing. Is there any reality to the charge? The ^ 

This trrittr has obsert ed a subtle form of undue test mfluence in his duties 
as a counselor and psychologist in the public schools. He has seen teachers 
teaching for tests, principals bragging about their school s test rerar , 
state tvhere the Deparunent of Education requires certain prescnbed tesB 
from K-12. Is it true, therefore, that testing people control educational goals 
2 nd the destinies of individuals? . . 

Today the answer is no, but if proper measures, such as the education oi 
those who use these instruments, are not instituted the answer in the future 
mav be ves. American education at the present time is too decentralized and 
looselv organized to control the lives of its students. In some Europan 
countries, of course, this is not true and testing (usually of the subjective 
essay tj'pe) is of paramount importance in the lives and destinies of students. 

In the United States tests usually follow rather than lead in curricular 
change. Though tests may affect a phase of a student’s educational experience 
it can be stated with some certainty that thQ* do not determine his destiny. 

Let us explore the basic problem of control and what it means in measure- 
ment terms. The teacher and/or principal who prov’ides an atmosphere or 
situation of competiveness on tests is laying the groundwork for future 
control- For example, Mr. Jone, principal of Xorthera High School, gi'"® 
his teachers a testing pep talk: 


Mr. Jones: Ladles and gentlemen, last year Northern’s test scores 
in mathematics and social studies were, on the average, the lowest in 
the county. We must be doii^ something wrong in these fields. Our 
kids are as smart as those in the other schools and 1 know our faculty 
takes a back seat to none. 

\S ell, I think I’ve got the answer to help our kids make a better show- 
ing. I ve asked our director of guidance to secure some old mathematics 
and sorial studies tests which arc similar to the ones we give our kids 
in the spring. I want you to study them and gear your teaching aoord- 
ingly. 1 know the other sdiools must be doing the same tlting and I feel 
i sure if we do we may even beat them. O.K-, let’s give it our best and 
show the other schools what kind of teachers and students Northern 
has. 

The preceding illustration is a realistic possibility, although most prindpak 
would probably be more subtle in their exhortations. The teachers of Northern 
arc now caught up In the teaching for the test syndrome. If they do not gear 
thar teching for the presmbed tests and their students do poorly, they 
c^en themselves to the possible loss of their position; at the very minimum 
displeasure of the administration. 

This kind of s^tion does a major disservice to measurement, education, 
and loolcontrol ova cumculum. An accurate picture of indiridual and 
group performance « compromised fay concentrating on the format and 



CONTEMPORARY ISSUES AND PROBLEMS 


content of these tests. A true picture of the educational level of the students 
IS at best tentative. Educational goals and objectives are reduced to recognition 
memorization, and ability to respond toa test rather than true learning, which 
incorporates not only memorization of data but integration of this data 
into the student's thinking and behavior. 

Of even more serious concern is the relinquishing of local and regional 
curriculum control. The task of devising the curriculum is handed over to a 
national testing center. This center, no matter how sincere and dedicated, is 
in no pwition to formulate educational objectives for all schools throughout 
the United States. Not only do individuals differ, but so do regions and 
schools. The curriculum at an upper-middle-class suburban school may not 
be appropriate for an urban school in a disadvantaged area. 

In addition to individual differences our educational system has tradition- 
ally been based on local control by the citizens of the school district. Schools 
in a democracy are intended to reflect the needs and desires of the com- 
munity. If the school bases its standards on the content of national tests, 
this tradition is seriously compromised. 

Educators and psychologists are very much aware of the fallacy of using 
tests to measure the worth of a school. They know that conditions and 
objectives differ from school to school. They are also painfully aware of the 
limitations and fallibility of standardized tests. No responsible educator or 
psychologist advocates the use of tests alone to judge teachers or schools. 
The problem again lies not ivtth testing, but with the Improper use of tests. 

In essence, then, educational personnel using tests must be aware of the 
limitations of measurement and proceed with the utmost caution in deriving 
inferences from test scores. This does not mean, of course, that national tests 


are of no value. Certainly, there arc basic skills and data that students at 
given grade levels should know no matter where they reside. If test scores 
are evaluated svithin the educational objectives of the school and are inter- 
preted in the light of the local school setting, they will add to the total 
evaluation. The impartanl point to remember is that tests ore only one general 
method of evaluation and must be considered in the light of the student popula- 
tion and the school atrriculum. , - , • 

Thus we may state that the criticism of test control of education and 
people is cenerally not true today. In some isolated cases, hmvcvet, it is 
true that uninformed teachers and principals alioiv tests to exert an undue 
influence within their schools. The solution to this situation, as m others 
is more measurement education for test consumers At the same time, all 

ofus-psycholog!sts,teachers,principals,andtestpublisheis_mustmaintaina 

constant vigil apinst the encroachment of tests dictating educational policy. 


Testing and Society 

Another area of criticism attacks the entire testing movement and often 
haf a, its goal the abandonment of testtng. Let ns examine these general 
charges in detail. 



26 


reasons for 


TESTING AND CONTEMPORARY ISSUES 


Cultural Bias ^ . v tj. 

The critieism of measurement, especially intelligence testing, that hold 
that tests are culturally biased is fraught with emot.onal otTrtones nd hdf 
truths. The heat engendered in this critical sphere has recently 
bv the civil rights movement. The critics state that testing is geared to midOlc 
class values and is, therefore, unfair to children of less fortunate “"ronments 
They cite the lack of enriching developmental experiences of disadvantageo 
children compared with their more fortunate middle- and upper-class peers. 
To these critics our present tests are culturally biased. The tests, according 
to them, not only reflect middle-class concepts but use middle-class language, 
and assume all students have been exposed to middle-class experience. 

The late Dr. Loretan (1966), former Deputy Superintendent of Schools 
for New York City, was instrumental in eliminating the use of IQ tests m 
the New York City schools. His basic reason for dropping IQ tests was the 
problem of cultural bias. He states, “the vocabulary and concepts in most of 
the group IQ tests are foreign to many children in our large and varied 
country (and certainly to many children in New York City). We have not 
been able to extract cultural biases from our tests, and yet we use these tests 
with children who are culturally different” (p. 6). 

Rosenberg (1966) in an address at the 1966 District of Columbia Psycho- 
logical Association meetings applauded the New York City school system s 
decision to eliminate intelligence testing. He stated. 


I feel that there has been some recent heartening news from such 
areas as New York City, where, as you all know, intelligence testing 
was essentially thrown out of the school system. This may sound 
like an extreme measure, but if the abuses in the New York School 
System were any^vhere as bad as the abuses that occur in the Baltimore 
City School System, then I say, “Better throw out the baby with the 
bath” and start again, if there is the slightest chance we may produce 
something better. To continue to damage children by inaccurate 
decisions based on our inadequate procedures is to contribute to 
what might well be called the scientific racism of the 1960’s. 


Rosenberg goes on to state that Drs. Wechsler and Anastasi (both well- 
known and highly respected psychologists) are correct in their assessment 
that “intelligence tests are correct, what is wrong is in society.” However, 
he feels that they do not go far enough. He feels that the labeling of “deprived- 
area children \s-ith low IQ’s condemns them to a status where they will be 
offered much less in our school systems.” Rosenberg states further, “it 
seems to me that our assumption must be that in the poverty area the 
intelligence of children is actually distributed normally as it is at higher 
socioeconomic levels.” 

Anastasi (1966), questions the logic of eliminating items that differentiate 
groups of people. She asks, 



CONTEMPORARY ISSUES AND PROBLEMS 27 

• ■ • stop ? We could wth equal justification proceed 

to rule out items showing socioeconomic differences, sex differences 
difterences among ethnic minority groups and educational differences! 
Any Items m which college graduates excel elementary school gradu- 
ates could, for example, be discarded on this basis. Nor should we 
retain items that differcnUate among broader groups, such as national 
cultures, or betiveen prelitcrate and more advanced cultures. If we do 
all this, I shall like to ask only two questions in conclusion. First, 
what will be left ? Second, in terms of any criterion ive may wish to 
predict, what will be the validity of this minute residue [pp, 456-57]? 


Lorge (1966) states that test makers may have attempted to eliminate bias 
from intelligence tests by using different processes in appraising different 
groups. He feels these efforts have not removed bias from measurement. It 
is his position that "ignorance of difference is a costly way to produce un- 
biased tests of intelligence” (p. 470). 

Rosenberg and others ivho have voiced their criticism of the cultural 
bias of tests are correct in one sense. That is that all of our present tests are 
culturally biased. If they were not they would have little value in our school 
system. 

The problem chat besets this area of debate is twofold; (1) the absence of 
a common frame of reference in definitions of terms and (2) the inability of 
the critics to face reality. Rather than do this they cloud the issue with the 
illusion of what one would like things to be. 

Let us look at the first problem, definition of terms. The labeling of 
group tests of school ability as intelligence tests is a grave error. This gives 
the impression to many people that what is being measured is innate 
intelligence, w'hen in fact 'vhat is measured is chance of success in school. 
Psychologists have recognized for some time now that the term intelligence 
test may be a misnomer. Today we say that intelligence tests arc tests of 
scholastic aptitude. Nothing concerning innate intelligence is stated or 


implied in this label. 

The second major problem is unwillingness of the critic to face reality. 
Statements such ss Rosenberg's that the children in poverty areas have the 
same normally distributed intelligence as those m higher socioeconomic 

levels seems to this writer an unsdenlific supposition. . 

It could be stated that a disadvantaged environment tends to inhibit the 
growth of normal intellectual functioning and this blockage begins ffo*" 
time of birth. The constitutional theorist may, on the other hand, slate 
with equal fen'or that there is a tendcnc)' m nature to breed true to type. 
That is. people who for one reason or another are uosuc^ssfu teed to 
choose mates of similar backgrounds, thus "“"S ^ 

whereas those who rise above the pove^ of-odrome 

similar strivings or those that ate atadytnajom 

Later m this chapter we will deal m acta h 



28 REASONS FOR TESTING AND CONTEMPORARY ISSUES 

Testing Program for Two High Schools 


Hen High School 


Washington High School 


Ninth grade: 1. school ability test 

2. interest test 

3. general school and 
vocational aptitude 
test battery 

4. achievement tests in 
mathematics, social 
studies, and 
English 

Tenth grade: 1. school ability test 

2. general personality 
test 

3. achievement tests in 
mathematics, social 
studies, and English 

4. music and art 
aptitude tests 

Eles'enth grade: 1. school ability test 

2. achio’ement tests in 
mathematics, sodal 
studies, and 
English 

3. national college 
aptitude test 


Ninth grade: 1. school ability test 

2. culture-fair test* 

3. interest test 

4. general school and 
vocational aptitude 
test battery 

5. achievement tests 
in mathematics, 
social studies, and 
English 

6. reading test 

Tenth grade: 1. vocational aptitude 
tests in specialiaed 
areas such as 
mechanical apti- 
tude, manual 
dexterity, and 
stenographic 
aptitude 

2. reading test 

3. attitude question- 
naire 

Eleventh grade: 1. school ability test 

2, culture-fair test 


• Culture-fair test is an instrument which attempts to measure experiences that 
ire equally familiar or unfamiliar to atl groups of people. 


relationship between intelligence and such variables as genetics, environ- 
ment, and ethnic background. 

^1^*5 writer would not make a definitive statement concerning the distribu- 
tion of intelligence in a disadvantaged area because of the many factors that 
impinge on this tj'pc of evaluation. It is easy, however, to state quite 
dcfiniiiicly that there is no data to support Rosenberg’s claim of a normal 
distribution of intelligence; on the contrary, the clinical evidence and 
•‘mtuhion” of many authorities would lead one to the opposite position. 

Hie important point to remember is that it is easy for our subjective 
predilections to interfere with our objectivity. In the long run this inter- 
ference can hurt the very cause and people wc want to help 

What i, cultural bus in tests? Ftrst of all, the language used is reflective 
of the wUurc. ^condly, the required test tasks arc reflective of the skills 
and objectives of our schools, which in turn should be reflective of the skills 


CONTEMPORARV ISSUES AND PROBLEMS 


Hess High School 


Washington High School 


general school and 

3. general school and 
vocational aptitude 

vocational aptitude 

test battery 

test battery 

4. reading test 

school ability lest 

5. national college 

achievement tests 

aptitude test 

in mathematics, 

6. achievement tests 

social studies, and 

in mathematics, 

English 

social studies, and 

. national college 

English 

aptitude test 

k national scholarship 

TwcIfU. grade: 1- General Aptitude 

test 

(produced by 

5. advanced interest 

Bureau of Employ- 

test 

ment Security, 
U.S. Department 
of Labor) 

2 . national college 
aptitude test 

3. national scholar- 
ship test 

4. attitude 

questionnaire 

5. advanced interest 
test 


and objectWK of 

the schooU were "^y wooW be ioePP/^f'' not to obmo o" '""J ' 
culturally biased, but th y '“dent’s academic P'’""''fJ°, 

The basic purpose of s® atudent of 'ultura 

measure of rsUe-O’ T'"'? vale*'' 'vooW 

achievement in our eehoo^ but how 

:annot be Jtnpro ’ . problems of . improvetneri u.ndicaps 

ansympathetic i'"P'“T'“5n test elimination- Cultut^ „„dition5 

tests need a great ^^^ber than „„ly changing 

test construction an ^baopng our j__^;’„„,;ons 'Vi" do 

vill not be teT°™lat promote tb 


tests need a great dea cnang-'-o . 

test construction an banging . Ovations wiH do 

will not be removed ^ „„,c ,bese depn 

within our society that P 


30 REASONS FOR TESTING AND CONTEMPORARY ISSUES 

Too Much Teeing 

Some criucs complain that children are J1 

answerine this charge it is very difficult to define how much is too much, 
it should remembered that tests are not altrays correct 
of ability. The mote information we have on a student, the better we are 
able to isist him and the less is the chance of error. The important qucsb” 
is what information is needed to help children develop themselves to their 
fullest potential. . , 

Let us look at two four-year high schools in two different settings an 
review their standardized testing programs. The proceeding oudme (pp. 

28 and 29) presents these schools and their testing programs. Hess is a 
suburban high school located in an upper-middle-class area, whereas 
Washington is a city high school situated in an underprivileged setting. 

(Before reading further, read and review the two testing programs cited 
with the following questions in mind: 

1. Is there too much testing at Hess, at Washington, or at both? 

2. If there is too much testing, what tests would you eliminate and 
why? 

If you feel that your knowledge is insufficient toenablc you to judge, remember 
that most critics of testing are no more prepared than you, and a great many 
do not hav-e as good a background as yours. Discuss your feelings and 
thoughts with your classmates and see what they have to say. After you 
have done this the foIloiWng discussion of the issues and problems of “testing 
too much" should be more meaningful.) 

Let us surt our discussion and analpb of Hess and Washington schools 
wih one of the basic premises of measurement: What information is needed? 
Hess administers seventeen tests and test batteries over a period of four 
years, v^hcrcas Washington administers twenty tests and test batteries over 
the same period. Is either of them, or are both, testing too much? It is 
obviously impossible to add the number of tests and declare the school with 
the most the winner of the “too many tests” trophy. The appropriate and 
most valid method of appraising a school’s testing program is to start with 
the school and its educational goals and objectives. Note that the present 
analj’sis will deal only with the question of the quantity and necessity of tests, 
not with the “Ideal * program. For a more complete and detailed discussion of 
testing programs see Chapter 17. The answer to our question is that Hess 
High School is testing too much. l*hcrc is no need to give a school ability 
test e%ety year. Hiis is especially true in an advantaged area. A school 
ability teat given in ninth and eleventh grades would be more than enough. 
Ibe majority of children at Hess will be going to college and will take one of 
the national college aptitude tests, another form of school ability test, in 
the eleventh and twelfth grades, aearly Hess is in error in administering 
SO many school ability tests. 



CONTEMPORARY ISSUES AND PROBLEMS 

There U also no need to "f^"rad'mSsS 
especially in Hess’ to stndents compare nationally, 

want some general assKsm jesting twice, either in the 

This is Yalid and could be K P' “ Jj ,„,eif,h grades. Local tests 

«^;S^n of art and ^ -dSS S 

nroEram is generally not Hess. High school students from 

STclat^i^Sn" **e*tte tw'aS "inle tenA 

2igiSS2SaES 

SlSd Sc 

ofonTorw“o%f the t=s^^ necessary and meanmgful. 

valid to state that the ^„j, jests. ^ gained by 

Therefore, Washm^on does^n ^ NrtMhat thJee of the 

In the rtinth grade a ,y,jy jBt^. of these 

the administration nderprivileged setting. In a 

six tests ate school ability . ,3„j an ““'^veat rTOuld be too many. 

instruments is ^1 ability tests in the sam y gauge these 

different area, three school^ ''•'“’‘n „ lout the United States. 

a, Washinnton this IS no inth cradcrs throug _ . ,„,jure-fair test 


At Washington this is not E^^^^tohat the culture-fair t«t 

students in comparison joes not “‘’“'jisujwmtaged groups than 

have been ^ JJ«J'^ubility i” a'dhion, the, constitute ano.h 

S%Td mechanical comp;*-- 

less culturally “‘'"''^todmima»»>“"°' " i setting «a^'"S * 

at least an average >«* jeouble. 

fifth-grade level may expect 



32 


REASONS FOR TESTING AND CONTEMPORARY ISSUES 

tests each year would then be a reflection of the added emphasis Washington 
gives to reading. ^ . ■ j • 

In summary, it can be stated that our most important concern jn judging 
whether a school tests too much is not quantity, but the appropriatenKS of 
the tests fora particular school in a particular geographic area with a particular 
student body and their own particular educational goals and objectives. In 
the preceding example we saw that the school that gave the most tests was 
not testing too much, whereas the school that gave less tests t\’as indeed 
subjecting its student body to more tests than necessary. 


Test Items and Specific Tests 

The critics who challenge the X'alidity of test items and of specific tests 
often pick one or two items from a test, show that they are poor, and then 
attempt to discredit the entire test and all tests of that type on the basis of 
these poor items. 

Multiple-Choice Tests 

The preceding technique has been the means for attacking the multiple- 
choice test. The critics state that multiple-choice tests favor the superfidally 
intelligent and penalize the student who has depth, subtlety, creativity, and 
critical powers of discrimination. This statement has been voiced in many 
ways and at many different times. One of the most vocal and literate of these 
critics is Dr. Banesh Hoffmann. Our attention in this section will be mainly 
de%’oted to Hoffmann’s criticisms because he is the most articulate of the 
test critics. 

Dr. Hoffmann’s contention that "ambiguous" questions penalize students, 
especially the "deep” thinkers, was investigated by Black (1963). Before 
discussing Black’s findings let us look at an example of a question that Dr. 
Hoffmann considers ambiguous. 

This question, according to Hoffmann (1962, p. 86), is from the Iowa 
Tests of Educational Development, Test 3, Part II, p. 9. 

The student is asked to find the word that is spelled incorrectly. If there 
are none, he is to mark the last number. 

1. cartons 

2. altogether 

3. possibilities 

4. intensionally 

5. none WTong 

The coirect answer is intentionally. Dr. Hoffmann, in citing this item. 
Illustrates Us deficits by quoting from a letter of a National Merit Scholarship 
winner. The correspondent staled that he chose none xorong because 



CONTEMpORARy ISSUES AND PROBIEMS 33 

co'r«dy°“-^ "•“ 

r "'i'" P'f =‘'■”8 example is an illnslration ot pedantry. Let us look at nil 
facets of the question and then you may decide if the question does indeed 
pen^ze the deep student. First, \vc must admit that one constructinc a 
speJhng test should himself be able to spell; further, he should check on the 
word be intends to be misspelled. Hoffmann’s observation that the question 
IS poor IS, therefore, granted. His further contention that this question 
penalizes the “deep” student or that test authors hide tests behind statistics 
and that therefore multiple-choice tests are not valuable is not conceded. 

The student of measurement should know that the test is intended for the 
seventh through the twelfth grades. It is extremely unlikely that the vast 
majority of students, no matter what their intellectual level of functioning, 
would be familiar with the rather obscure word intentionally as used in 
logic.i Further, when one examines the words in the test item, the grade 
range is evident. The words cartons, altogether, and possibilities are obviously 
at a lower level of vocabulary and educational development than the word 
tntensionally. Therefore, the exceptionally “deep" student might ascertain 
that the test author or printers made a mistake and mark intensionally as 
incorrect because they probably meant to misspell the word intentionally. 
Let us go further, far purposes of discussion, and grant that the “deep” 
student missed this question because of his greater knowledge. Would he 
in fact be penalized? Black (1963), in seeking the answer to this question, 
examined items of the Scholastic Aptitude Test (SAT) to find ambiguous 
questions. In addition, he discussed the problem with personnel of Educa- 
tional Testing Service. Nine items out of fifty-five were considered ambiguous 
by Black and an official of the Educational Testing Service.* 

The next question that Black vvas concerned ^v^fh bow many "wrong" 
answers he had given to the questions that offered more than one correct 
response. He found that he had missed only three of the “ambiguous” 
questions, “What this meant in terms of my SAT score was that I had been 
penalized 24 points on a test scaled from 200 to 800. Put another way, I had 
just become two percentile points brighter. The difference was certainly 
not enough to keep me out or get me admitted to most colleges (pp. 115—116). 

The issue in multiple-choice testing is not whether these tKts have reached 
perfection, nor is the issue whether there are poor and ambiguous questions. 
Multiple-choice tests are not perfect, or anpvherc near perfect. There are 
items that do penalize “deep” students. Ferris (1962), m replying to a 
criticUm of Dr. Hoffmann’s concerning a physics question that he had 

‘ TVebster>s Nnv ColUgialt Dictionory (1961) defin« 

1. Note Rare. Tension. 2. Inicmity. 3. Intensifi^tion. 
as of the mind; determination, 5. Im^iveness. 6. A i or *"> 

qualities, or characterisnes comprised in a wcepl or ' eitenuon'' (d 435) 
of 'trimgle' implta mrmbtoh.n 'in £ 

‘Poblijhm of Iht Colics, 

United States. 



34 REASONS FOR TESTING AND CONTEMPORARY ISSUES 

constructed for Educational Testing Service, agreed that the question^ was 
ambiguous and "a good student might be disturbed by its ‘ambiguity. 

The important question is whether multiple-choice tests such as the Colley 
Board Examination discriminate between different levels of ability. If in 
doing so the “deep" and “creative” student is penalized, then obviously 
Hoffmann and other critics are correct. Choosing a dozen or more questions 
from tests over a period of years and showing their ambiguity or inaccurate 
content does not prove the case against multiple-choice testing. It only reveals 
that some questions were poor. It does not reveal the good questions, nor 
does it prove that “creative" and “deep” youngsters obtain lower test scores. 

The author knows of no scientific studies or investigations by anyone in 
the field of testing that support the critic’s contention that multiple-choice 
tests penalize the creative student. There are many studies, however, that 
demonstrate a considerable relationship, though far from perfect, between 
high test scores and success in college. 

Until demonstrated by scientific investigation that multiple-choice tests 
do significantly penalize certain students, we must proceed on the premise 
that these tests assist in guiding young people. Of course we must continue 
to evaluate and re-evaluate our tests. Bearing in mind the possibility of test 
error, we must also consider other factors in guidance, such as school records 
and teacher recommendations. (See Chapter 11 for a more detailed discussion 
of college admissions testing.) 


Problem Areas 


In addition to the major criticisms previously discussed there are several 
issues relating to testing which present problems to many people. Often these 
problems are compounded by a lack of communication and a misunder- 
standing of the factors involved. 


Intelligence and Intelligence Tests 

A major problem area in testing and one which antagonizes certain groups 
IS concerned with intelligence, intelligence testing, and IQ. Many of the 
problems are created by a lack of understanding of what these terms mean, 
and what they do not mean. Let us examine these concepts and terms 
morp clnspiu ^ 


Intelligence 


It IS very difficult to define the word intelligence. Many psychologists and 
educators have defined intelligence as a concept that is synonymous with 


CONTOMPORABY ISSUYS AND PROBLEMS 33 

learning ability. Others have narrowed their definition to a self-erident fact 
namely, intelligence is ivhat intelligence tests measure 
1 he student of ineasurement should keep in mind that the term m/eftWe 
as used in psychology, Im m Mule meaning. It is frequently defined to 
meet the needs and orientation of the person defining it. Dinet, the pioneer 
01 intelligence tests, described intelligence as a unitary characteristic. That is 
the human organism's ability to adapt and cnticaUy perceive his environment' 
lerman (1916), who pioneered Binet’s test in the United States, defined 
intelligence as "an individual’s ability to carry on abstract thinking” (p. 42). 
Wechsler (1960), states that "intelJigence is the aggregate or global capacity 
of the individual to act purposefully, to think rationally, and to deal effectively 
wth his environment" (p. 7). 

It is evident from these different concepts of intelligence that there is no 
general agreement among theorists concerning the essence of intelligence, 
and the preceding concepts by no means exhaust the thinking on the subject. 
The interested reader can find further discussion of intelligence by referring 
to the works of Spearman (1927), Thorndike (1927), and Thurstone (1941). 

IQ: What It Means and What It Does Not Mean 

There is a difference between intelligence as conceptualized by the 
theorists and the measurement of intelligence. It is evident that the measure- 
ment of intelligence is related to the ability of a person to perform or function 
in a given test setting. Thus, within this frame of reference the only intelli- 
gence we know about is the specific performance of an individual on a specific 
test. Intelligence as an inherited quality is an abstraction when related to a 
specific measurement instrument. It can only be surmised from behavioral 
responses to a test based on what someone considers "intelligence." 

The student of measurement, if he understands the tentative basis of the 
term intelUfence, will be in a better position to evaluate meaningfully the 
literature concerning the relationship between intelligence and such variables 
as heredity, environment, race, and culture. 

The most simple definition of IQ is, of course, intelligence quotient. 
The Dictionaiy of Psycholog)' (Warren, 1934) states that IQ is "the ratio of 
an indn'idual’s intelligence, as determined by some mental measure, to 

normal or average intelligence for his age" (p. 141). 

For some people IQ has been a magic term that is all-cncompassmg and 
definitively accurate. Since the inception of the first IQ test many teachcre 
and parents have unrescr>'edly labeled children as bright or dull on the basis 
of their IQ. The danger of this assumption is enormous. Good ( i (oo) states, 
"The term IQ has been so grossly abused and it is so ina^ropnatc to the 
shifting values yieldctl by various tests that we probably should abandon it 
altocether” fn 1 79) ’'s definition stated, IQ is the ratio of the mdividua! s 
intdligvnve aV determine! by » twt. TIu. is 

"sometiimg” which is nicssurcd by a mental test. I his definition does not 



REASONS FOR TESTING AND CONTEMPORARY ISStJES 


allow for differing typts of intelligence, nor are psychologists certain of what 
they mean by this kind of intelligence. Hebb (1958), a noted psycholo^st, 
states that Binet “learned how to measure something without any very clear 
idea as to what it was he was measuring.” Today, more than sixty years 
later, we measure It a bit more satisfactorily, and now it can be measured 
with a great variety of techniques, but we arc still somewhat uncertain about 
what “it” is. 

In addition to our uncertaint}' of what intelligence is, there are many 
variables inherent in the construction and administration of tests that limit 
further our equating IQ with innate intelligence. That is, errors in test 
construction, sampling of population, scoring, and ph)'sjcal and psychological 
conditions of the test setting. Factors such as previous schooling and cultural 


emdronraent also play their part in the IQ test score. 

The IQ (based on the 1960 Stanford-Bmet) fixes the mean (average) at 
100 wth a distribution of sixteen points on either side; that is, the range of 
average or normal intelligence is roughly defined as falling between 84 and 
116. Because this distribution is considered within the normal range, the IQ 
can be interpreted to indicate a child’s relative position in a group. Children, 
for example, with scores on an IQ test above 1 1 6 would be considered above 
average, whereas children with scores below 84 would be considered below 
a«rage, with specific numerical classifications definining more predsely 
their degree of delation from the average. The IQ, then, is a method of 
reporting a test score related to other test scores in a given group in the same 
way, though more precisely, that one grade in a classroom is related to other 
grades. 

The accuracy of the reported IQ is directly related to the worth of the test 
per se and its purposes and format as well as environmental and cultural 
faaors. The group intelligence test that reports a test score in the form of IQ 
is evaluating, to some degree, different aspects of “intelligence” than an 
individual intelligence test. The boy, for example, who reads poorly wtII 
have this deficit in achievement reflected in his group IQ score. On the 
other hand, the same child taking an individual intelligence test will not be 
penalized because little, if any, reading is required. The two IQ’s may then 
mean two different things, although thej’ are both expressed in the same 
sj-mbolic structure. 

Neither the individual nor group intelligence test is completely accurate. 
The IQ sjmbol representing the results of these instrumentsshould be riewed 
in the same light. That is, a reported IQ is only a relative indication of how 
an indiridual compares to others taking the same test, IQ must be con- 
sidered along with other evaluative criteria such as other tj'pes of tests, 
classroom performance, and the observations of the teacher. A more definitive 
c^mination of test score imerprctations will be presented in subsequent 
chapters. l*hc important thing to remember, at this point, is that intelligence 
um me only one meature of ability, and IQ is only a symbol of this ability. 



CONTEMPORARY ISSUES AND PROBLEMS 

Heredity yersm Em ironmeiit 

Th.- relative influence of heredity and environment on an individual s 

and does is determined genetically, person’s experiences. Some 

feel that development is that an individual is the 

people, have, ““'f '■ '* 3 „fLvironmental influences. The 
result of a combination of n i 1 %a jtebatc. In the years 

development of the intelligent , j intelligence enormous 

since the development of '"^'-'''t/lfcrharrde possible comparisons 

amounts of data have been compiled nhiehlhmema^^ 

among, for example, . ' f„t dees that tvould provide the 

Basic to all of these comparisons is differences the result 

reason for differences among 

of heredity, environment, or j. ijjg answer to 

In spite of the counties M^'s been. The debate 

this question is as difficult and a s and 

and investigations continue ,ha, ,he relative influences of 

rather academic. It must be s j importance to psychologists 

Hbieditv , jre genetic position. Conrad and 

Let us first look at the 'orrelation betMccn the intelligence 

Jones (1940) demonstrated a ' hiUren. That is, parents 'vith hig 

test scores of parents and nhcrcas parents with biv IQ » 

human from similar socioeconomic b g pj^umed 

studied unrelated sources for study becau ..f that twins 

twins have Critics have "imnmental 

similar genetic ch cnv'ironment and, that, th -omoarisons have 

“'’r.v hc'.he rmi"n. s-ariables- Co^^^^^ Jr*^and^ those raised 
factors may be :,tentical twins "ho lived raised together 

been made ”,1 also been made of ">>' „„ the 



38 


REASONS FOR 


testing and contemporaby issues 


Identical twins reared together 
Identical twins reared apart 
Siblings reared together 
Siblings reared apart 
Unrelated children reared together 


r 

0.925 

0.876 

0.538 

0.517 

0.269 


Other investigations have studied brilliant or feebleminded persons to 
determine if such characteristics run in families. The results of rnost of 
these studies re\eal that these traits do tend to run in certain families. 

Some investigators have attempted to remove children from "poor 
enrironments to "good” enrironments and then measure the change, if any. 
Much of this research has been characterised by inadequate scientific control 
and statistical procedures and results have lacked any clear trend. 


I^'N'IRON’MENT 

Let us now look at environment, whidi for the purposes of this discussion 
will be dirided into two categories, physical and psychological, 

Phyiical. In the physical area some of the factors which relate to the 
future de%'cIopmcnl of the individual are (I) prenatal environment and 
conditions of birth, (2) dietary conditions of pregnant women, (3) disease 
during pregnane)', and (4) nutritional climate and ph)*sical health of the 
child. 

In the prenatal period of development sei'eral eniironmental agents 
influencing later intellectual growth are known. Among these the most 
notable is German measles, which if contracted by the mother during the 
early stage of pregnancy will often attack the fetus and cause mental 
retardation. 

Certain factors operating at the time of birth have been shown to have 
a relationship to mental impairment. Some of these factors are anoxia 
(oxygen deficiency), mechanical injuries, anesthesia, prematurity, and 
abrupt births. 

Mental dcxclopmenl may be impaired in early childhood by phy-stcal 
trauma to the brain, poisons, certain viruses and bacilli, and ver\' high fex'cr. 
Gross protein starx'ation has also been noted in some cases of mental 
retardation. 

number of studies have showTi that chronic psychosocial 
factors, such as mother-child separation, maternal neglect, and personality 
problems of the mother, influence the mental dcs’elopmeni of the child. 
Another factor that has been widely studied is the effect of a restricted 
ensironmeni upon mental development. 

study that mmpared children living in an isolated mountain environment 
'*ith children li\ing in a valley community was conducted over thirty-ses'cn 
years ago bj Sherman and Key (1932). ’Fhe mountain children lived in an 
isojtcd \-alley in the Blue Ridge mountains; the rillagc children lived at the 



CONTEMPORARY ISSUES AND PROBLEMS 

were lower than those of , . ^ trmitos- however, the decreases were 

ie„ea 

;:nL„“c;To/»“:“[lreso^ 

ment to be significantly higher ,h,t en- 

r— tbus.tbas^.-";rt;^-?^^ 

of a few investigators. (1%3),‘ in a revieiv of all 

Erlenmeyer-Kimling and Jarvik (t ™ )■ consistency in the genetut 

of ,i;Lt Lty yea», J,' * 'b'yTm test. The data a so indicate 

influences on intelligence » i„ mental development. 

that environment plays an important r 

revolved .j heredity-cn"^®" ennronment, 

conception « result of ^^fhc^cnctic^^ deter- 

that a given trait ^^^biUty refere ^ „ in a trait. 

[P-Sl- . „dcr to improve education »e shoo 

Jensen goes on to state that in ^ 

. The reader »ho i. 
referred to the exienrne uvtK 



40 MASONS FOR TESTING AND CONTEMPORARY ISSUES 

aware of the child as both a biological and social bcmg. lt is his 
if we do not recognize the biological basis of educability we will restrict 
out eventual understanding and possible “n'™' “f ™]or sources o 
diversity in human capacities and potentialities” (p. 39). 

Caspari (1968), a biologist concerned wth the issue of heredity and 
environment, feels that there is strong evidence for a genetic influence on 
intelligence. He feels evidence for this view is provided, m part, by genetic 
conditions which can lower intelligence (for example, phenylketonuria and 
galactosemia). He states, "There seems to «ist a large number of genes 
influencing this character (intelligence), since almost all chromosomal 
aberrations found in man are accompanied by mental deficiency (p. 51). 

He goes on to discuss the ratio between heredity and environment and 
concludes that one cannot state which is most important unless we study a 
specific cultural environment. The more important question to Caspari 
is the interaction of heredity and environment in the production of intelli- 
gence. He concludes by stating. 


The challenge to education appears to me to reside in the problem 
of how to create educational methods and environments which will 
be optimally adjusted to the needs of unique individuals. The main 
contribution which a geneticist can make to educational research is 
to stress the fundamental biological fact that every human being is a 
unique individual and that his genetic individuality will be expressed 
in the way in which he reacts to environmental and educational 
experiences [p. 54]. 


The viewpoint of Anastasi (1958) that heredity and environment are 
mutually interdependent seems to be the prevailing one of many psychol- 
ogists and educators in the last twenty years. Stone and Church (1957) in 
their concluding chapter on childhood and adolescence state this position in 
reference to the problem of heredity versus environment. They feel the 
problem is "obsolete.” 


We hope that our discussion throughout this book has made it 
clear how obsolete such problems have become today. On the bio- 
logical level, we know that the organism has to exist in a field, that 
it has no traits except as these are nourished on the environment in 
which it is embedded, that its hereditary endowment defines the 
frarnework of what it can become, and that within these limits the 
environment acts to shape it. On the psychological level, we have come 
to realize that all human behavior is both hereditary and learned, that 
we rnust take account of the cultural environment as well as the 
physical one and that the problem is one of defining the ways in which 
a growing person with his own characteristics incorporates a body of 
values, attitudes, beliefs, knowledge, and practices, making them his 



CONTCMPORARr ISSUES AND PROBUMS 

own and at the same time being transformed by them. Now we are 
inclined to realize that each indiridual incorporates and views his 
culture m his own idiosyncratic way, but that he would be a very 
3 difFerent person growing up in another culture [p, 402). 

This witer has noted in the last few years a stirring among psychologists 
to investigate again the relationship of heredity and cmironment in the 
future development of intelligence and other behavioral characteristics. 
Although the literature does not now reveal this renewed interest to a great 
degree, it seems certain that a return to the intensive ins'cstigations of the 
1930 s and early I940’s will in the years to come be quite evident. The 
impact on the American society of the dvil rights struggle and the increasing 
awareness of poverty has already induced psychologists and others to study 
the relationship of race {heredity) and culture (environment) on intellectual 
performance. More will be said on this topic in the next section. 

Let us conclude bj' stating that it is generally believed today that heredity 
provides the structure within which mental growth takes place. The endron- 
ment, on the other hand, is iriflucntial in determining the form and growth 
of certain abilities which might be developed. For example, Mozart and 
Einstein probably would have been outstanding in a primitive culture, but 
they would not have been outstanding in music and mathematics if these 
disciplines did not exist. On the other hand, it is very difficult to conceive 
of an environment that could create an Einstein or a Mozart without the 
genetically determined raw material ivhlch th«e men possessed. 


Test 5'eorc Differences Between Races and Social Classes 
The question of differences in test scores for different racial and socio- 
economic groups is, of course, closely related to our previous discussion of 
heredity and environment. If heredity determines intelligences or is at least 
a predisposing agent, are there significant differences betivecn races and 
socioeconomic groups? If environment is the key to intellectual ability, 
then cultural and educational opportunity should be the determining agent 
in test scores. Let us look at the past fifty years of research to see if any 


conclusions can be drawn- .... 

Jensen fl968) asks if an '‘official dcasion" has been made to create an 
.•mprvssion that the itatie of racial variatiotB haa been scientifically t«ed 
with conclusive results. A publication of the U.S. Office of Education (1966) 
states “It is a demonstrable fact that the talent pool in any one ethnic group 
is substantially the same as that in any othet ethnic group." Or take the 
Department of Ubor (1965) report on the Negro family, nbich concludes 
thaf "IntelliEence potential is distributed among Negro infants in the same 
’ ,■ .nd nattem as among Icelanders or Chinese, or any other. 
’"^Ss writer hM never seen anytbi^ in the literatute that supports, w ith 


scientific certitude, the statements 


of the l/.S. Office of Education and the 


Department of 


Labor. Jensen (1968) in m4e%ving these statements states. 



42 REASONS FOR TESTING AND CONTEMPORARY ISSUES 

“Such statements entirely lack a factual basis and uncritical acceptance of 
them may unwittingly harm many Negro children born and unborn (p. li). 
The discussion that follows will he an attempt to look at some of the data— 
not to “make” a case for one side or the other. 

Research Findings 

After the first intelligence tests were developed, studies were made 
comparing children and adults of different groups. The basic concern was 
to find differences, if any, in various groups of individuals. Comparisons 
were made between different age groups; racial, social, and national groups; 
urban and rural groups; and Income levels. The findings indicated appreci- 
able group differences. Children living in rural areas and in the Southern or 
southwestern United States, as well as Indian and Negro children, exhibited 
generally lower scores than children living in Northern urban areas. 

In the area of social and economic levels many studies reveal, on the 
average, that children belonging to upper socioeconomic levels do better 
in intelligence tests than children from the lower strata (Tyler, 1956; Fifcr, 
1966; Davis, 1951). This seems to be true in England and Russia, as well 
as in the United States (Johnson, 1948). These studies, however, do not 
clarify the reasons for differences. Some studies inject the additional variable 
of motivational differences. These investigations point to the possibility 
that middle-class children try harder on tests than their lower-class peers 
(EelU, 1951), The interpretation of these investigations has generated 
conflict and confusion among scientists and lay people alike. 

The first large-scale studies of group differences were the result of massive 
testing in World War I. The Army, at the beginning of the w’ar, asked the 
American Psychological Association to help construct a method for classifying 
soldiers according to their mental ability. Robert Yerkes headed a committee 
which de\-eloped a group test of intelligence called the Army Alpha Test. 
The development of this test drew heavily upon Arthur Otis’ work. (Otis, 
a former student of Terman’s, had been experimenting with a group test of 
mental ability.) In addition, the committee devised a test for illiterates 
called the Army Feta Test. Almost 2 million soldiers were ej^mlned using 
these instruments (Chauncey and Dobbin, 1966). 

The Army Alpha was a test of ability which examined the person’s 
knowledge, simple reasoning, arithmetic, and ability to follow directions. 
The Army Beta was a nonverbal test. Directions were in pantomime and 
removed the effect of differences in language ability from the score. The 
Beta was administered to all men who fell below a given score on the Alpha. 
This group was composed of men W'ho were handicapped by a foreign- 
language background or illiteracy as well as those who performed poorly on 
the Alpha for any other reason. 

1 he Alpha and the Beta have gone through many revisions since World 

ar I and arc still in use today. Since their development they have served as 
models for most group intelligence tests. 



CONTEMPORARY ISSUES AND PROBLEMS '*3 

The results from testing nearly 2 million men using both the Alpta and 
Beta revealed racial and regional differences. Native-born whites achieved 

Garrett (1945) has V” onnortunity ate of mote importance 

rr He"^=tha. an p— ^ 

made on the basis of these r Ohio, Pennsylvania, Illinois, 

Garrett (1945) reported sates in the same 

and New York scored W”' i„ ,he whole country. Garrett 

proportion as they scored below the j^es improve 

concludes from “J'' „hite. He notes that white 

his score but not his P“j'‘°S„cational deficits, did 

Southerners, even though they h “ jf Samhern whites had the 

as well as Northern Negroes. Thus he „„ ,he test, 

same advantages as N«'''em ^eg'”“; foL indications that improvemen 
Studies since World War I ha h ^ increase in IQ 

l;rt5t^^^X-S^-C-dreninNew.rk^ 

Klincherg's (1935) investigation ol 8 j„ ^ew York City 

frustration, according o . , which would be 

'simaSVundeTo^tional condUm- Fom^e*"| I™“^P -;t?e1f "lie 

same ethnic group as 


44 


REASONS FOR TESTING AND CONTEMPORARY ISSUES 

four areas—verbal ability, reasoning, numerical abilit)', and ability to deal 
with spatial concepts. Xhe data showed test performance differences between 
middle-class and lower-class groups regardless of ethnic origin. These 
differences varied from group to group. Of the middle-class groups, the 
Jewish middle class had the highest average scores, whereas the Puerto 
Rican middle class had the lowest average scores. The investigators felt their 
study revealed “strong evidence of differential patterns of mental abilities 
among four . . . ethnic groups” (p. 489). 

A special note on a very important study on the Equality of Educational 
Opportunity (United States Department of Health, Education, and Welfare, 
1966) merits attention in our discussion of race and test score differences. 
This study was undertaken in accordance with Section 402 of the Ci^il 
Rights Act of 1964 which directed the United States Commissioner of 
Education to 

1 conduct a sun*ey and make a report to the President and the Congress, 

I within two years of the enactment of this title, concerning the lack 

\ of a^’ailab^Uti' of equal educational opportunities for individuals by 

I reason of race, color, religion, or national origin in public educational 

1 institutions at all le\'els in the United States, its territories and 

j possessions, and the District of Columbia [p. iii]. 

The survey focused on six racial and ethnic groups — Negroes, whites, 
American Indians, Oriental Americans, Puerto Ricans (U\4ng in the conti- 
nental United States), and Mexican Americans. The study was conducted 
in the fall of 1963 and involved 4,000 public schools and 645,000 pupils. 

The test results revealed that, with the notable exception of Oriental 
Americans, the average minority student scored much lower on the battery 
of standardized achievement tests in e>'erj’ area than the average white pupil. 
In addition, it was found that this lower achievement for minority groups is 
progressively greater as the educational level increases. 

In the first grade the Negro group’s median (average) score in the "non- 
verbal test was lower than that of any other group, whereas the Oriental 
American’s score was higher than that of any of the sLx groups. In the 
“verbal” area the Puerto Rican group was lowest followed closely by the 
Negro group; the white pupils obtained the highest scores. i 

In the twelfth grade the Negro group obtained lower scores than any of 
the six groups in the following tests: 

1. Nonverbal. 

2. Verbal. 

3. Reading. 

4. Mathematics. 

5. General information. 

6. Average of the five tests. 



CONTEMPORARY ISSUES AND PROBLEMS ” 

AUhough .he .vhhe g-p .ob,a!ned .he hih« ^ 

dosely.by 0.icn.^ A--"; .ha. ..edem. !n .he 

li\thr“Neg.o:^.e^ 

by Borga..a and ‘ <'^Vnves .ica.or of racial diflerenco. smd.cd 

Mar.in Jenkins, an early Nep 6 j ^ con.iJen.ble 

gifted children of bo.h races m .he 193Us^ . ^ „f 

number of Negro children Mere .M reigned Ihil 

tvliile children, .ha. is, an IQ of jjegm cl.ildren 

sisleen published cvcce aboie 140 . 

n-ilh IQ’s above 130 , and »^;^'3,a,ee, 

Jenkins (1954), in discussing pfiedNcg Ne-ro 

I am no. a..cmp.ing here .o *'jJ“5^Ue.-cli of psychonielric 
children as nhi.e arc .o be (»""•' b gh, 

imclligcnce. There appears h.de ^ub. . „b„ 

Negro children is relal.ve y fjcvcrtlieless. it is apparen 

Sren in .he .o.al Amenan WP'^no;;!; v ^ found 

.ha. children of very superior psj „f ,he range 

in many Negro f^hc' '*■- » 6cc«“y 

allained by ihe evlreme deviales B 

1P' ■ .1 .l.les of ihc controversy 

Spuhler and Undzey 

apee in one .important area, ®, ,ha. of the typical Amenean 

IQ of .he American Negro IS 85 or 8. „™i,ng of .he 

" The'^u'dies .ha. have been “^""^n'cernSl ni>h 
hundreds of investigations ' ^j^„den. who desires ' P' 
economic differences, ’rhe ,„„„ccc in Shueys 19«i 

subject further will find an Negro and "W^’^^ j^ucae. 

treatment of the mul.i.u _. b 3 ca*'poin. of'iew j^., „dcs of iHe 

,inceWorldWarI.Al.l.ouBh Sh..Q “^ ^^cn. both -aes 

from her excellen. scholarohip ^ ___ 

controvxrsy. «f this chapter an , ^s;rcti%e ai't'tiRC 

In the next two sections »' 'h” ' the educational obj^ 

all the previously nienlioned dam 

all children within our cdeeable authorities an rg 

some general sta.emenm by Vnowleogea 

Shuey ( 1965 ) smtes. ^^,^,ber ihey’ ,F-raro 


uey f.'sno) suw... -r „ ,p 

to 12, to chiljrcn m Gn^cs 


reasons FOn TESTING AND CONTEMPORARy ISSUES 

students to enlisted men or officers in training in the Armed Forces 
in World War I. World War II, or the Post-Korean period— to 
veterans of the Armed Forces, to homeless men or transients, to 
gifted or mentally deficient, to delinquent or criminal; the fact that 
differences between colored and white arc present not only in the 
rural and urban south, but in the Border and Northern states; the 
fact that the colored preschool, school, and high school pupils living 
in Northern cities tested as far below the Southern urban white 
children as they did below the whites in the Northern cities; the fact 
that relatively small average differences were found between the IQ's 
of Northern-born and Southern-born Negro children in Northern 
cities; the fact that Negro school children and high school pupils 
have achieved average IQ’s slightly lower in the past twenty years 
than between 1921 and 1944; the tendency toward greater variability 
among whites; the tendenq* for racial hybrids to score higher than 
those groups described as, or inferred to be, unmixed Negro; the 
evidence that the mean overlap is between 7 and 13 percent; the 
ewdence that the tested differences appear to be greater for logical 
analysis, abstract reasoning, and perceptual-motor tasks than for 
practical and concrete problems; the c\'idence that the tested differ- 
ences may be a little less on verbal than on nonverbal tasks; the 
indication that the colored elementary or high school pupil has not 
been adversely affected in hts tested performance by the presence of 
a white examiner; an indication that Negroes may have a greater 
sense of personal worth than whites, at least at the elementary, high 
school, and college levels; the unproved and probably erroneous 
assumption that Negroes have been less well motis'ated on tests than 
whites; the fact that differences were reported in practically all of 
the studies in which the cultural environment of the whites appeared 
to be similar in richness and complexity to that of the Negroes; 
the fact that in many comparisons including those in which the 
colored have appeared to best advantage, Negro subjects have been 
either more representative of their racial group or more highly 
selected than the comparable whites; all taken together, inevitably 
point to the presence of native differences between Negroes and whites 
,, as determined by intelligence tests [pp. 520-21]. 

Jensen (1969) is in general agreement with Shuey. He argues that the 
failure of our educational efforts to compensate for disadvantaged environ- 
ments is directly related to the false premise that IQ differences are the 
result of environmental conditions and the cultural bias of tests. It is his 
contention that genetic factors arc more important in determining IQ. 
Jensen feels that environmental deprivation can keep a child from performing 
up to the level of his genetic potential, but the best programs cannot elevate 
the child abtn'e that potential. 



CONTEMPORAKY ISSUES AND PROBLEMS 4 , 

On the other sideoftheergument, itisof .nterest to note that in September 
ua: at the Atnencan Psychological Association’s convention, the Society 
tor the Psychological Study of Social Issues stated the following; 

The evidence of a quarter of a century of research on this problem 
can readily be summarized. There are differences in intelligence 
when one compares a random sample of whites and Negroes. What is 
equally clear is that no evidence exists that leads to the conclusion 
that such differences are innate. Quite to the contrary, the evidence 
points overwhelmingly to the fact that when one compares Hcgroes 
and tvhites of comparable cultural and educational background, 
differences in intelligence diminish markedly. The more comparable 
the background of white and Negro groups, the less the difference in 
intelligence. There is no direct evidence that supports the view that 
there is an innate difference between members of different racial 
, groups. 


Garrett (1962) in reviewing the preceding, declares that Negro-white 
differences in mental tests are so regular that he feels they suggest a genetic 
basis. 

Cronbach (1960) states, 


Racial comparisons have frequently been misinterpreted because 
liberal writers want to prove that there are no innate differences in 
ability, and certain conser\'atives want to prove that nontvhite groups 
will not profit from improved educational opportunity [p. 204). 


Cronbach’s view is amplified further by Spuhler and Lindzey (1967), 
who after citing numerous studies and summaries of studies state, “ft seems 
clear, however, that with at! this investigation the position of most modern 
observers is at least as much influenced by prior belief as by present findings" 
(p. 391). 

In reviewing all the literature, one thing is quite clear, the vast majority 
of studies reveal very significant differences between test scores of different 
racial, ethnic, and socioeconomic groups. This writer’s own position on the 
conflictual studies and interpretations is in agreement with Spuhler and 
Lindzey (1967), “What we can say with confidence is that racial groups 
differ in intelligence as measured by existing instruments. The extent to 
which these differences arc to be attributed to biological factors (race) rather 
than to experiential (particularly cultural) factors remains largely unknown’ 

(pp. 391-92). 


Educational Reauty and the Disadv.c.*taged 

In order to assist an indiridua! in his choice of and preparation for educa- 
tion and meaningful work we must knmv his past achievements, present 



48 reasons for testing AND CONTEMPORARY ISSUES 

capabilities, and future potential. One method of evaluating these is by 
testing. Are these tests useful in guiding the disadvantaged? First, let us 
focus our attention on the achievement test, the area of testing that is the 
least controversial when dealing with the disadvantaged. Lennon (UM) 


It is axiomatic that the school and the teacher must know the present 
status of each child, and the progress he is making, with respect to 
certain concrete goals of instruction. . . . Whatever adv’antagcs or 
limitations a child brings to his school learning tasks, the school and 
the teacher still must be concerned with how successfully he is attaining 
the goals of instruction; and this is one of the contributions that the 
standardized achievement test makes, whether in the case of the 
culturally disadvantaged or the more fortunate pupil. 


The school needs to know the present status and progress of both its 
advantaged and disadvantaged children.* Thus we may answer part of our 
question; achievement tests are useful in guiding disadvantaged children. 
The teacher, of course, is in an excellent position to know’ w'hcther the 
achie^’ement test is appropriate for her particular students and should share 
this information with the school administration. 

Proper test administration is very important, especially with disadvantaged 
children. Many articles and books have been written about the alienation of 
culturally disadvantaged youngsters. The test situation, unless properly 
handled, could reinforce this alienation. The test administrator must be 
especially careful to convey to children his concern and interest. Explanations 
concerning the purposes of testing should always be made clear before the 
examination begins. Disadvantaged children need to know that testing is 
not another obstacle or method of “getting them.” 

In testing these children the gravest question is, “Are IQ tests fair for the 
disadvantaged?” In order to address ourselves to the question of fairness or 
unfairness of IQ tests for the disadvantaged, we must first briefly review why 
IQ tests are given. In general terms, IQ tests are given in order to determine 
what an individual is capable of doing. What are Bill’s chances for success in 
school? Will he need special attention? If so, what kind of instruction? Is he 
mentally retarded or gifted? If an IQ test can help to show that Bill needs 
special attention and that attention is then given, Bill has obviously received 
immense help. 

Present performance may be gauged by school grades, teacher opinion, 
and standardized achie\-ement tests. Future potential cannot be ascertained 
in the same way, because the teacher’s judgment is already reflected in grade 
evaluation and opinion. Another evaluative technique must be used to make 

» The following discussion of reality- and the disadvantaged is based in part upon 
a previous publication of the author. Sw Katmel (1967). 



CONTEMPOHAHV ISSUES AND PROBUaiS 45 

sure that Bill is not being judged by factois which may be unrelated to his 
acrfem.c potential, such as teacher bias or poor academic background 

Thus, m using a test of future potential, along with the teacher’s opinion 
and data from objective achievement tests, wc have a situation similar to the 
^ecKs and balances in our government. The teacher’s opinion and evaluation 
IS checked and weighed again&t what a child may accomplish on standardized 
achievement tests, in which his performance is compared to children all 
over the country, not just those in his classroom. 

If IQ tests and other data help the school to understand each individual 
(as an ovenvhelming number of studies show), then certain inferences must 
be made. These inferences ate drawn from an interpretation of the test 
results. 1/ the test data do riot dheriminate bettceen individuals, the teacher or 
guidance counselor cannot make inferences, nor can he help the individual student. 

In attempting to facilitate learning the school i$ concerned with what a 
pupil can and will do. A pupil’s race or socioeconomic status is not important 
in educational planning. What is important is the individual’s present status 
and future potential, that is, what he has attained and what he may become. 

We are livingin a middle-class-oriented society and the school is a reflection 
of this society. In judging a student the school is not evaluating him In terms 
of his ability to “get along’’; it is appraising his ability to function and grow 
in an educational setting. 

In discussing this problem in a previous publication (Karmel 1967), the 
author has stated the following; 


If IQ tests are uniformly unfair to Negroes and other groups, then 
80 are our schools and perhaps more importantly our educational 
aspirations for our children are in need of radical replacement. We 
seem to be in a quandary concerning the basic rule of education. On 
the one hand we demand higher and higher standards of excellence 
from our students and on the other hand there are those who say that 
we are unfair to have standards, and it is of as much value for a child 
to know the codes of the city street. Does recognition of a child’s 
cultural deprivation and different mores mean one should accept, 
condone, and perpetuate them? 

The value we place on educational achievement may be contested 
by some, but most educators and lay people, both Negro and ivliite, 
would agree on the efficacy of educational advancement as a key to 
the enrichment of the individual and his society. The methods in 
achieving this goal may differ, but in learning one must ha\x the 
“stuff” to learn with no matter what the learning technique. Thus, 
the question of the fairness of the IQ test is diteetly relat^ to the 
educational mores of our society for m the last analysis the IQ test js 
measuring the potential of Johnny tn succeed m a school system 
oriented to society. If it nns not aimed in this direction it ivouM hate 
little value [pp. 11-12]. 



50 


REASONS FOR TESTING AND CONTEMPORARY ISSUES 

In essence, then, the variables of generics, environment, race, socioeconomic 
status, and cultural bias are academic questions when we are considering the 
practical educational realit>- of how well a pupil will do and how the school 
can best ser\-e the individual student with his own unique potential to learn. 
The student of measurement should not infer from this that these variables 
are unimportant. On the contrary, they are extremely important when 
considered in their proper context. If our goal is to predict the chances for 
an individual pupil to succeed in our present school system, then we must 
have tests that reflect this system. The reasons that an individual may or 
may not be able to learn is an issue beyond that of testing. Testing is not 
geared, nor should it be, to treatment. It may be used to facilitate treatment 
by providing data which the school may use to render individual educational 
planning and guidance. To illustrate this point let us look at Smith Junior 
High School. 

Smith Junior High School is located in a predominantly disadvantaged 
neighborhood. Mr. Lowell, the principal, is concerned about the appro- 
priateness of group IQ tests for his students. This concern is shared by many 
of his teachers. ^Iany feel that IQ tests are unfair because of their cultural 
bias. These concerns prompt Mr. Lowell to call in Mr. Fields, a testing 
consultant, to help guide the school in its educational planning. The faculty 
meeting is open for discussion and the science teacher poses the rirst 
question: 

Sdence Teacher: Is it fair to say that one of our students is dumb 
because he gets a low IQ or test score? I feel it isn’t because the tests 
'< arc made up for middle-class kids and our kids don’t have the same 
j exposures. 

» Mr. Fields: I agree it is not fair to say your kids are dumb. It Is 

fair to say, however, that the chances of success in school are not too 
1 good for your kids who obtain low scores on IQ tests. 

1 Science Teacher: \\*hat do you mean by not too good? 

Mr. Fields; Studies have shown that there is a high relationship 
between IQ scores and school success. That is, children who score 
I at the average level or above have a better chance of staying in school 
1. and graduating and they arc able to do the work required. 

L Social Studies Teacher; I agree with the studies but it still doesn’t 
^ seem fair to label these youngsters. 

Mr. Fields; I agree, these youngsters should not be labeled. The 
< purpose of testing is to aid not hinder educational growth. 

1 .Mr. Lowell; Are you sajing that IQ tests arc not indicative of 

^ innate intelligence? 

•, Mr. Fields: No. I am not saying that they are not indicative of 

1 innate intelligence, nor am I saying that they are able to identify 
; innate intelligence. Wc truthfully do not know, \Miai we do know is 

■ ; that they are very helpful in predicting school success. This is why 



CONTEMPOBABY ISSUES AND rROOLEMS 5j 

10 call them scholastic aptitude tests rather than (nteillgence 

: Mo* Teacher; Well, that may all be true but I know they are used 

to label and smce they are not perfect or near perfect how can you 
defend them? ^ 

Mr. Fields; I cannot defend their exclusive use but in conjunction 
tvith teacher evaluations |}»cy enhance educational prediction. 

Mr. IjjwcII: What about using a culture-free test? 

Mr. Fields; There are no such ttets. Yes, some tests are labeled 
culture-free or culture-fair, but research does not bear out the promise 
of their titles. They too in some measure reflect the culture. 

Librarian: If we could have a test that was free of cultural bias 
%vould you recommend its use? 

Mr. Fields; It would depend on several factors. First, what would 
[ its predictive utility be? Secondly, how would you use it? That is, if 
' it predicted school success as well or better than our present instru- 
' ments without pcnaliaing youngsters because of cultural background, 
then of course I would use it. If it didn’t predict school success but 
could be used as an aid in detecting youngsters with ability who could 
be educationally salvaged by intensive teaching, then I would still 
use it. 

Even though your children come from disadvantaged homes you 
will still And some who do well on tests and you can advise these 
children accordingly. On the other hand, without these tests a child 
in your school may think he is doing quite well and not know of his 
standing with youngsters from different areas of the county. 


Let us leave Smith Junior High School and direct our attention to individual 
cases. John, a si.\-year-old Negro child, is evaluated by an IQ test. The test 
reveals that he is on the borderline between dull normal and mentally 
retarded. This is a situation where the knowledge of cultural bias would be 
important. There is no doubt that John's test results should be received with 
more skepticism and caution than if he were white. The school should weigh 
other factors and not rely solely on this score. If, however, John were fit-elre 
years old and tested at the same level, the results in terms of educability 
and reality would be less suspect. This is not to say that there might not still 
be error, but the chance of making up for the cultural deprivation would 
be less. 

The student of measurement should remember that the interpretation of 
IQ scores, as is true of the scores of other tests, mvist be made on the basis 
of many factors, including culture, but in the final analysis the scores should 
be interpreted in accordance ^v^th the objectives of testing. Whether the child 
is advantaged or disadvantaged be competes and lives in a society geared 
to middle-class objectives and values. . , 

American education must face the problem of how best to educate the 



52 REASONS FOR TESTING AND CONTEMPORARY ISSUES 

disadvantaged. Alleviation of this problem, however, 

eradication of the instrument that gauges its existence. The lU test is on y 
an instrument to gauge what is, not what should be. Thus, the answer to 
cultural and socioeconomic deprivation lies in rectifying and treating the 
etiological agents that create the disadvantaged community, not in doing 
away ^^ith an instrument that helps us know of its presence- To do away with 
IQ tests because they measure social problems is as logical as throwing out 
medical procedures that indicate cancer because a treatment today is not 
available. 


A Practical Approach to the Use of Standardized Tests 


The reader has now been exposed to what an IQ is and what it is not; 
what “intelligence" is and what intelligence tests are; discussion of the part 
heredity and environment play in human behavior; test score differences in 
Negro and white children and adults; and interpretations of test scores 
for the disadvantaged. The question before us is how to translate these 
facts, findings, interpretations, and points of view into sound educational 
policy. 

Sound educational decisions emanate from sound educational policy. 
Educational policy* must be based on the best available information gleaned 
from experimental investigations and empirical observations. Although the 
issues, conflicting data, and different interpretations seem confusing, there 
are areas of agreement that can be used as a basis for sound educational 
polic)'. The}’ are the following: 


1. Tests are not infallible but used along with other methods of 
evaluation, such as teacher judgments, they are very useful in helping 
individuals realize their own unique potential. 

2. Educators must understand the limitations as well as the 
advantages of tests. 

3. Teste should never be used as an excuse to avoid human judgment 
or contact- 

4. The number of teste used is determined by the amount of 
information needed. 

5. Hereditary differences do not alleviate the fact that most 
investigators recognize the importance of environment in shaping the 
child’s intellectual ability. 

6. Teste should alwa)’s be used in conjunction with educational 
objertives. Interpretation of test scores are only meaningful when 
applied to educational and individual goals. 


CONTEMPORARV ISSUES AND PROBLEMS 

Referertces 

Anas,. si, A. Some imp.ica.ions 

sr ■ 

Black, H. They G R^i^'of EquaUtyof educational opportumty. 

Houghton Mifflin. 1966. Pp- 7-8. ^ f,„i,i,l semblance in 

Da^rr socioeconomic influences upon children's learning. Veieretend^i ‘ ‘ 

^ rthd 1951, 20, 10-16. , , s, cate for ttatioml ottm. Office of 

net. eurriculuml •'■y'-'’"' “ 

Ferris, F. L., Jr- 5 M 00 / Rfvim. 1962, TO. mental abilities. 

common sense. coUural group difterencra ej,), mmemery 

Garrett, H. E. T e ,• ™ In C !• Chase and H. G. 

ooo\^rR%>— 

-...-r..n..W.B.Saundem,«58..„„ 



Holt. 1965. 



54 


REASONS FOR TESTING AND CONTEMPORARY ISSUES 

Jensen, A. Social class, race, and genetics: implications for education. American 
Educational Research Journal, 1968, 5, 1-42. , u j 

Jensen. A. How much can we boost IQ and scholastic achievement? Harvard 
Educational Review, 1969, 39, 1-123. 

Johnson, D. M. Application of the standard-score IQ to the social statistics. 

Journal of Social Psychology, 1948, 27, 217-27. 

Kardiner, A., and Ovesey, L. The mark oj oppression : A psychological study of the 
American Hegro. New York: Norton, 1951. 

KarmeJ, L. J. Do JQ tests dUcriminale against Negroes in their preparation for 
the world of work? New York Personnel and Guidance Bulletin, 1967, 19 (3), 
10-13. 

KUneberg, 0. Negro intelligence and selective migration. New York: Columbia 
University Press, 1935. 

Klineberg, O. (Ed.) Characteristics of the American Negro. New York: Harper and 
Row, 1944. 

Lee, E. S. Negro intelligence and selective migration: a Philadelphia test of the 
Klineberg hypothesis. American Sociology Review, 1951, 16, 227-33. 

Lennon, R. T. Testing and the culturally disadvantaged child. Paper read at 
scries on Problems in Education of the Culturally Disadvantaged, Boston, Feb., 
1964. (Copies may be secured from the Test Department of Harcourt, Brace & 
World, Inc.; 757 Third Avc.; New York, N.Y.) 

Lorctan, J. 0. Alternatives to Intelligence testing. Curriculum and Materials: 

Board of Education of The City of New York, 1966, 20 (3), 6-9. 

Lorge, I. Difference or bias in tests of intelligence. In A. AnastasI (Ed.), Testing 
problems in perspective: Twenty-Jxfth anniversary volume of topical readings from 
the invitational confaenee on testing problems. Washington, D.C.: American 
Council on Education, 1966. Pp. 465-71. 

Montagu, M. F. A. Intelligence of northern Negroes and southern whites in the 
first %>orld war, American Journal of Psychology, 1945, 58, 161-88. 

Myrdal, G. An American dilemma. New York; Harper & Row, 1944. 

Nc\sman, H. M. Multiple human births. Garden City, N.Y.: Doubleday, 1940. 
Rosenberg, L. A. Scientific racism and the American psychologist. Paper presented 
at the meeting of the District of Columbia Psychological Association, Washing- 
fon. D.C., October, im. 

Sherman, M., and Kc)’, C, II. The intelligence of isolated mountain children. 
Chili Development, 1932, 3, 279-90. 

Shucy, A. M. The testing of Negro intelligence, (2nd cd.) New York: Social Science 
Press, 1966. 

Spearman, C. E. The ahilititi of man: Their nature and measurement. New York: 
MacmiUan. 1927. 

Spuhlcr, J. N., and Lindzey, G. Racial differences in behavior. In J. Hirsch (Ed.), 
nehavtor-genetie analysis. New- York: McGraw-Hill, 1967. Pp. 366-414. 
Stone, L. S, and Church, J . Childhood and adolescence : A psychology of the growing 
person. New ^ork: Random House, 1957. 
letman L M .Hfaturmrnr of intelligenee. Boston: Houghton Mifflin, 1916. 
n^tndAe. K I., ct al. The measurement of intelligence. New York: Columbia 
Teacher % College, 1927. 

riiurtt^c. L. 1... and Tl»orstonc, T. G. Factorial studies of intelligence. Psycho. 
wrfrifi 1941, No. 2. ^ ^ 



CONTEMPORARY ISSUES AND PROBLEMS 55 

Tyler, L. The psychology of human differences. (2nd ed.) New York: Appleton- 
Century-Crofts, 1956. 

United Slates Office of Education. American Education, October, 1966. 

United States Department of Health, Education, and Welfare. Equality of educa- 
tional opportunity: Summary report. Catalog No. FSS.238:38000. Washington, 
D.C.: United States Government Office, 1966. 

Vandenberg, S. Contribution of twin research to psychology. Psychological 
Bulletin, 1966, 66, 327-52. 

Warren, H. C. (Ed.) Dictionary of psychology. Boston: Houghton Mifflin, 1934. 

ireii/er’j neto collegiate dictionary. (6th ed.) Springfield, Mass. : G. & C. Rlerriam, 
1961. 

Wechsler, D. The measurement of adult intelligence. (4th ed.) Baltimore: Williams 
and Wilkins, 1960. 

Yerkcs, R. M. (Ed.) Psychological examining in the United States Army. National 
Academy of Science Memoirs, 1921, 15. 




CHAPTER 


3 

Test Ethics, Standards, 
and Procedures 


The major criticisms and controversies of testing have been presented. 
One of the most obvious conclusions from a review of test criticism is that 
many times the attack should not be directed at the test per se, but at the use 
of tests. In this chapter an attempt will be made to give the student some 
guidelines for proper test usage. 

Before entering the important area of test standards and procedures let 
us turn our attention to the prina'pal charactenstics of tests. In Chapter I 
tve defined and briefly discussed tshat a lest is in general terms. It was stated 
that to test is to et’afuafe and compare one person or variable uith a given 
criterion or criteria. Using this definition, can ue state that a ps>xhologic3l 
test evaluates intelligence or personality, or docs it measure both ? WTiat do 
you think? WTiat are the basic ingredients of a psychological test? 

The basic purpose underlying the construction and use of psychological 
tests is to sample some aspect of an indiridual's beharior. TTjIsis true uhether 
the behavior is mental ability or personality characteristics, whether the 
obj’ective is to evaluate reading progress or interest patterns, whether the 
purpose is to measure attitudes or possiblebrain damage, whether the purpose 
is to measure creathaty or musical talent. The list of possible behavioral 
samples is lengthy. The point to remember is that a psychological test is 
not defined by content as much as it is by function. 


59 


60 


TESTING THE TEST 

The hsyche Mempts to measure an aspect of 

behasZ in an olfectice andllPndardized tnanner. (See section on 6«ndardiza- 
tion Chapter 1.) The primary intent is to sample human behavior without 
introducing human subjectivity. This goal, of course, has never been fully 
reached on any psychological test, because perfect standardization and ob- 
jectivity have not yet been attained. On the other hand, a reasonable amount 
of success in this area is evident with the majority of tests. 

Teacher-made tests are not generally considered psychological tests because 
they usually do not fulfill the requirements of objectivity and standardization. 
Our discussion of ethics, standards, and procedures in this chapter will be 
focused on psychological tests. 


Some History 

The development of an official body of standards for testing is relatively 
new. It was not until 1954 that an official statement on test procedures w-as 
published.' This statement represented a consensus of data that was con- 
sidered most beneficial to a test consumer. In the period before 1954 test 
standards and quality varied according to the ethics, standards, and knowledge 
of individual test authors and publishers. In order to understand more fully 
the importance of standards in testing let us briefly Te\'iew the major periods 
and issues. 

The first use of the term menial test in the psychological literature was in 
an article by Cattell (1890). The article discussed the use of tests of muscular 
strength, speed of movement, reaction time, sensory discrimination, and 
other measures used to determine intellectual Ic^•els of college students. 
Other investigators, such as Ebbinghaus (1897), dcNdsed tests of arithmetic 
computation, memory span, and sentence completion to measure school 
children’s intelligence. Many more investigations could be dted, but the 
importaol poiot is that these early efforts were to lead to the development of 
the Binet intelligence scales. 

In 1905 Binet, along with Simon, developed the first intelligence test 
similar to our present tests. The 1905 scale was a tentative instrument with- 
out an objective method for deriving a total score. The second Scale (1908) 
introduced the term mental age, which compared children to normal children 
of the same age. (For example, a mental age of five years means a child passed 
all the items normal five-year-olds would pass.) In Chapter 7 we will deal 
with Binet’s first efforts in more detail. 

The work of Binet was reviewed with earnest enthusiasm in the United 
States and his tests were translated from French into several English versions, 
the most noteworthy of which was the Stanford-Binet. 

*The first officu! declaration on testing standards w-as called Teelimeal Recom- 
mmiatimj /or Psychological TetU and Diagnottie Techniques, written and published 
by the American Ps>-cho!ogjcal Association in 1954. 



test mncs. stand, \ nns, and pBocrounES 
hyl. M Tc™>nO';l'.) 

Tlic fc"'’'' l,j, „.»Drlcr^ »ere dcvclopins 

•n,U n Itii- r«itK! diit.ri: "Iwch ni„ T^jn (1916) and nthm such 
,!.c r.nt .calca 1, «a, .hn.n." .h.. Bmi Sale. 

a. Kuhhu.nn d'-l’) »c.c Dj"' f ^ a, ,l,c "fad" periud 

TTir ncal (nuitrcn scan. I'll'' 1'- ■ > ; individual tesla in 

in tes, develnrn-.er.t ll.r '7' a, a time. This period saw 

that they could Iw ',°n"c deaelopment of froup testing a, you 

the cmeipence of gtmip tesliOa , 1, rough! about by the entty 

will tecall fttim oan ^11,, need for swift tnlelleclual e assi- 

nf the Vnilcd State* into WoiM ''a I „f the Army Alpha 

fiation of nter a trtillion teeruit* led tothe de^ „„ and 

.lilli were developed and u d ' „( trstmS, »P P 

men, .... in it* tnf.n^' ” m„'?ency to jump on the ba^^^ 

Amcriam arc no exception. .Uts anJ of the uses 

wmething ne.., j rime Iweame ■""'"'’''“.\vhen the test* fti'''*."’ 

logists, and ‘”’'7* ’ ( 1 %S) states ’i'^' "|,,y tovvard all testing 

made of them. ' ^aii„„,, slepricm and „p„„ ,he 

Te; "aSird-tdiusThe a*’ much to retard a* to advance 

This mounting ^'fo n^ns'r'" f 7,i, 1930 to 1946. 

I'he exacting standard* of .^elopnrent. War II P'c 


npraisai anu f ctin<trucu««* - 

e exacting standard* of J j^eloporeot. , |j War II P'c 

e much to -his P'™^ "Id res. ba""'”; From about 1946 to 
The successful use of aflrr the .'.r en .j„„d a rebirlh- 

i„ ,0 n -orasoremen' 7 ’ ; i„d in ..WA >«'mg.__ P^ use of 

.AorouP,,.m*ybe--y;^-- 



62 


testing the test 


tests and test batteries. Large-scale testing programs in local school sptems 
were launched and those administered by the College Entrance.Exammation 
Board were expanded. During this time a new college entrance examination, 

called the American College Testing Program, was founded.* . , 

This renaissance of measurement also saw the birth of test standards. U 
was in this period that the American Psychological Association (1954) and 
the American Educational Research Association and the National Council 
on Measurements Used in Education, Committee on Test Standards (1935) 
first published recommendations for the construction and use of tests. 

In the early 1960’s, as had been the case in the 1930's, a negative reaction 
to testing became evident. This time, however, the bulk of test criticism was 
voiced by lay critics. Most of these critics had little knowledge of testing and 
did little research to back up their attacks. (See Chapter 2.) 

The period from 1962 to the present may be characterized as the “illegit- 
imate criticism” era of test history- This may be contrasted to the “legitimate 
criticism” period of the l93Q’s and early I940's. Whereas the first period 
was initiated, directed, and pursued by professionals who were earnestly 
attempting to rectify past errors in order to produce better tests, the present 
critical period sees nonprofessionals attacking tests in a sensational and non- 
scholarly manner.® A destructive rather than a constructive approach is 
evident. 

In the light of test history and the recent barrage of critical attacks on 
measurement the importance of standards cannot be overemphasized. As a 
student of measurement it is particularly important for you to be aware of 
the code of test ethics and standards we are about to discuss. 


Ethics 

The history of psychological measurement has witnessed rigorous adher- 
ence to the scientific method, personal dedication, and creativity. On the other 
hand, there has been needless duplication, intrusion of profit motivation, and 
improper scientific attitude and procedure. To combat the misuses of psycho- 
logical tests, professional organizations concerned with measurement have 
dc\'elopcd ethics and standards. In 1963 the American Psychological Associa- 
tion published an article entitled “Ethical Standards of Psychologists.” 
Testing is a major area of concern in this document which codifies the 
professional ethics of the association. The proper use of psychological tests 
is also featured in “Ethical Standards” (American Personnel and Guidance 

♦A federation of sute programs founded in 1959. Objcctncs are similar to the 
Collets Board ptosram. (See Chapter U.) 

. t ^ notable cxwption is TeUing, Testing, Testing (American Association of School 
Admirntira^rs, Council of Chief State School Officers, and National Association of 
beCOTdarj’-^hool Principals, 1962), a brief booklet v-hich attempts to take 3 critical 
St tesunfj in a roherent and productive manner. 



test ethics, standabds, and procedukes “ 

Association, 1961), tvhich is the code of professional ethics of the Anterican 
Personnel and Guidance Association. orofessional organizations 

In addition to 

{American Psychological Association, Education, 1966) concerned 

ciation. National Council on ,i,e "bible" for test 

with measurement have joined toge ? Psydobgical Tills 

standards. It is called Stmdards /or E pertaining to 

and Manuals. Let us professional organizations, 

ethics and standards as formulated by in p 

Distribution and Sate ethical use of psychological 

One of the most important points ^ pfied users. The 

tests is that the sale distribnnon ^.,p ,p , pe „f 

qualiBcations of the purchaser chonid ^ ' 

test being sold. For example, the ,pip,y, schievem n , 

administer, score, and interpret admininster an individual 

and interest, but he may ’g^^fojd.Binet or a personality ^ 

Guidance Association (iW eUte^.^^, and interpr « 

levels of competence member to ™ thin his 

therefore the °Volv those functions which fall 

competence and to per 2 Qg\ “Psvchological 

preparation and 1oc,ical Association (1963) u’ present 

The American Psychol^irf ASS P-t ku e Asm only to 

tests are offered for ' Jfesional way and 

and represent their tests in a P^ in .c« ^ (pTs,). 

qualified users. . . ■ ^hu ^ ,„„„d interpretation 

fessional qualifications required I 

Test Inurprelalion workcis 

One of the t“'\'™n„mrpretation. „leased only 

the general public I scores, hhe tes , .j-opcrly" 

of Psychologists to interpret * principle stresses 

to persons who arc q • 1963, p* 59)- ^„„alified psychologists or 

Psychological ilose supemsion by q ^jic individuals 

three basic points, « i test ° loM™ 

counselors; (2) “m™™" tot mismterpre ,„tes) 

i„suchamanneras.opard^P„ P«". "Sng .»> 

intirpnlalwns of j,;,.c dcnccs in 

(3) nse of adequate interpret 
parents and/or students. 



64 


testing the test 

Many of the abuses and misuses of tests have resulted from the i^e of 
tests by persons with little or no training. Test results should only be 
released to qualified personnel, and adequate facilities for further counseling 
should be available if the results are particularly disturbing to the individual. 
For e.xample, John Smith, a senior at Exodus High School, is in the upper 
quarter of the class. His college entrance e.xamination scores are very low 
and he is understandably very depressed and despondent over the results. 
At this point further counseling is certainly indicated. The counselor or 
psychologist could probe for the possible personal reasons for the low 
scores and he might discuss possible measurement factors that could account 
for the discrepanc)’. 

If John Smith does not receive appropriate counseling he may decide 
against going to college and his self-concept could be appreciably lowered. 
In some cases extreme disappointment could lead to attempts at suicide. 
Thus even the most seemingly innocuous test results must be handled by 
qualified personnel who are trained to counsel as well as interpret test results. 

Test Security 

Test security is one of the most important ethical concerns of psychologists 
and educators. Control of the distribution and sales of tests is needed not 
only to prevent unqualified persons from using tests but to prevent public 
familiarity with test items which could interfere with test validity. Principle 
13 of “Ethical Standards of Psychologists” (American Psychological 
Association, 1963) states, 

Psychological tests and similar assessment devices, the value of 
which depends in part on the naivete of the subject, are not reproduced 
or described in popular publications in ways that might in\’alidate 
the techniques. Access to such tests is limited to persons with pro- 
fessional interests who will safeguard their use. 

a. Sample items made up to resemble those of tests being discussed 
may be reproduced in popular articles and elsewhere, but scorable 
tests and actual test items are not reproduced except in professional 
publications. 

b. The psychologist is responsible for the control of psychological 
tests and other devices used for instruction when their value might 
be damaged by revealing to the general public their specific contents 
or underljfing principles fp. 50j. 

It is readily apparent that if a high school senior obtained a copy of a 
college admissions examination, or the answers to many of the questions, 
the test would no longer be a measure of college aptitude for him. One of 
the cnns^uenccs of such a situation might be admission to a college where 
the candidate has little chance of success. 


1 



test ethics, standabds, and proceduhes 

Many times tests are Ic! o" this. For example, 

every teacher and counselor h testing, the author witnessed a 

during a school ® before administering an algebra 

counselor instructing children g jb eighth-grade had a lot 

aptitude test. “Boys and girls,” he smd bow to do 

........ 1 ,.-- 

?meam well” and felt that he was “he^mg .!» W- ehlldren for 

in fact hurting them. The P“'P“^ Those students not ready for algebra 

ninth-grade algebra and E'"®' sle academic shocks. The validity 

by how much it is impossible to tell. 

ImasionofFrhacy ,he testing profession as well as the 

'^"i;=Ss?F;SSsS;S 

-n.. 

information m j-forniation to be dixtilg nurpose of the 

or who allows su* mtama^ the in- 

after making “ 'equation and of *' Assoeiatioo, 

interview, testing o Unictican Psycnoi g 

formation may be use 

^nthTata'oflfiden.*^^^^^^ 

purposes and only witn p 

guide representing a eonse 



66 


TESTING THE TEST 

true it should be noted there were informal test standards before that toe 
These standards could be found in textbooks and other publicatmns. lest 
publishers and authors interested in quality have generally adhered to these 
It is also true, however, that less dedicated or knowledgeable authors and 
publishers produced tests that fell short of these informal standards. In 
order to lessen the occurrence of inadequate tests, professional associations 
concerned with testing produced official statements of measurement 
standards. The discussion of testing standards that follows will be generally 
based on Standards Jar Educational and Psychological Tests and Manuals 
(American Psychological Association et ah, 1966).® Our attention will be 
particularly focused on areas of concern to the user of tests. 


Advertising 

Test publishers should present their tests in an accurate and complete 
manner. Beware of extravagant claims in promotional materials. For example, 
an advertising brochure for an achievement test may state that, “This 
instrument measures the modern day educational objectives of American 
Histor>'.“ This, of course, is very difficult, because curricular objectives differ 
from school to school and from teacher to teacher. The statement suggests 
that the lest is suitable for all classes in American History'. It may indeed be 
an excellent test but the potential user cannot be sure, without detailed 
inspection, that the test is in Hnc with the curricular objectives of his school. 


Test Age 

'Phcrc is no specified period that a test and manual may be used without 
revision. It is recommended (American Psychological Association et ah, 
1966), however, that a publisher should withdraw a test from use if the manual 
is fifteen years old or more. Society and educational objectives change w'ith 
time. For example, a test to predict algebra aptitude might be completely 
outdated by the “new math.” 

The rcl(r.-ant questions concerning test age are: What is the relationship 
of the test to current educational practices and to the contemporary society? 
How recent is the date of publication and/or revision? Is there new data 
Nsiih the new revision? Arc there dates given for the collection of the new 
data and new norms? 


Test Manual 

ITic test, the mantial, and alt other accompanying material should be 
geared to helping test users evaluate test results. It is very important that 

• A copy of StanJardi far Educational and Fiychologieal Tetls and .t/anwa/i may !)« 
ocitatned by ifcntJinK Sl.OO to the Atnencan PiychoIoRical Association, 1200 Seven- 
tfcnih Street, N.W.. Wsshinj^'ton, D.C. 20036. 



TEST ETHICS, STANDARDS, AND PROCEDURES 

the users of tests know what the results mean, 
is making sure that all material dealing wth 
is dear and correct. This means that it shot 
teacher as udl as to a measurenient speciaiist. 

If the user is not able to read the manual «ith 
able to interpret the test scores adequately. 

The manual should stress the vulnerability of the test and discuss faaors 
^er than the test score that need to be considered in interpreting the results. 
The test manual should assist the test user by stating precisely the basic 
purposes and uses for which the test is intended. 

It is very important that the test manual indicate to the prospective 
purchaser and test seller the qualifications required to administer and inter- 
pret it. If a test may be used for different purposes, the manual should state 
the amount of training needed for each use. 

Be careful of statements in manuals that do not have a statistical basis. 
If, for example, you read that such and such a score indicates "psychotic 
tendencies,*’ look for statements that tell you what proportion of people 
obtaining that score have later been identified as psychotic. 

It is essential that every test manual contain the validity^ of the test for 
each interpretation to be made. It is incorrect to say “validity of the test”; 
one can, however, speak of the validity of particular interpretations. For 
example, the manual for an English test of mechanics of expression may state 
that the test is appropriate for high school juniors and seniors planning 
to go on to college. However, is it able to discriminate among those students 
In honors classes who are planning to matriculate at highly selective 
colleges? 

It is essential that every test manual contain the refiabilit}* of the test. 
The manual should report the evidence of reliability and the method used in 
obtaining it. The yardstick for reliability should apply to every score or 
combination of scores. For example, a lest yields a verbal and nonverbal 
score; reliability for both scores should be reported. 

Test manuals should contain directions for administration that you can 
understand and practice. Be sure that the directions are clear enough that 
your students will understand the tasks that are required. The scoring 
procedures should also be presented in detail and with clarity so as to 


One method of facilitating this 
the interpretation of the test 
lid have meaning to a school 
. This is extremely important, 
understanding, he will not be 


eliminate scoring errors. 

Norvis^ should be presented in every 


test manual. They should refer to 


ChSSVf» A-™.- •>» ’ 

t;an .h.. 

the test consistency in measunngw banner it downwasure. v t 

detailed discussion.) ,t>nstical methods, the test perfomunew of 

• A norm is a Avay of aesenwnp, 
specific groups of students of A-anous age* and/or grades. 



„ TESTING THE TEST 

68 

specific populations so that the children being tested may be compared to 
these reference groups. 


Testing and Scoring Procedures 

The most obvious and primary consideration when disciissing testing 
procedures is the need for rigorous adherence to standardized testing con- 
ditions as outlined in the test manual. It is unfortunate but true that many 
teachers and sometimes even counselors violate this basic rule. Almost all 
test manuals are quite explicit about the need to read the directions verbatim 
and follow the time limits with precision. (For an excellent discussion of 
this and other group testing procedures see Thorndike, 1949.) 

It is impossible in a book such as this to outline all the various techniques 
in test administration and scoring of all psychological tests. For example, 
certain tests such as the Stanford-Binet and the Rorschach require 
specialized training, including spedal courses, texts, and intensive super- 
vision. The primary purpose of the follow'ing discussion is to assist school 
teachers and counselors who will be called upon to administer and score 
group standardized tests. 

Admirustering the Test 
The Test Administrator 

The administration of a group test is not a complicated procedure. Any 
teacher and most secretarial personnel can be trained to perform this function 
through an in-ser\'ice program. (Sec Karmel, 1965.) The relative simplicity 
of administering group tests leads many teachers and school administrators 
to the erroneous conclusion that little advance preparation is necessary. 
This, of course, is not true. Xo matter who administers the tests and no 
matter how many years of testing experience he may hai e had, the examiner 
needs to know the pccuHariiies of the specific tests he is to administer. An 
in-ser\-icc program attended by all faculty members during the first month 
of the school year should promote good test administration. 

Tlie first session should include a general orientation that encompasses 
the reason that tests arc given; what they mean; and their specific implica- 
tions for pupil growth and educational development. The time period for 
this phase of the session should be no longer than forty minutes, with at 
least fifteen minutes de^'oted to questions and discussion. After approxi- 
mately forty minutes an informal ralfec break to continue the discussion is 
a good idea. 

A second session for faculty who arc to administer and proctor*’ the tests 

*• A proctor « an assistant to the test administrator ssho helps by passing out test 
materials, keeping order, and answeiing student quesUons. 



test ethics, standards, and procedures 

, U . A T-Kt should be scheduled within two to 

should also be planned. Th purpose of this session is to 

five days of the school tpting - thoroughly with the specific 

familiarize the test administrators m p . . j administration 

tests to be used. Stress should be placed on d.m«,ons for^t^^ 

of the test, especially standardizing procedura. m p 
following test procedures should he emphasized. 

1, rest d-rertio-u tWd ofreuyr ie/o«o.rd reit/ioiK any deuatian. 

This means that the test ^‘‘“thrshr,dteTo:^^^^^^^^^^ 
even slightly. If *0 directions are „„d„stand the importance 

^f relTng dir::«^^^^^^ No ma.» how good your memory may be, 
never rely on it when admmistenng a es 

I u hd. /tnstcered tsithin the context of the 

2. Student questions should be 

test directions. 

This may mean ^ 

going over practice examples to clarity ^ example, loot m on the 

Lnd the directions Sefo^ ^ Hart is reading the directions, 
administration of an arithmetic tes 

( ♦h*. test YOU will have an opportun ty 
Miss Hart: In this ‘“p^s. Look at your test booklet 

Pf r. - St 

Student: ' _ mind I understand- -mnle problems? 

' MisflS" 

Arc there any questions DC. 



70 


testing tiie test 


not only for raised hands but for possible problems by the expression of 
the students. Seeing that there are no questions Miss Hart continues 
1 her reading of the directions.) Remember to choose the answer you 

i think is tight and circle the letter next to it. Do not spend too much 

j time on any one problem. Any questions? {Miss Hart looks around 
j once again.) All right, begin. 

During the testing session Miss Hart and her proctors nraintain a constant 
vigil, ahvaj's available to help any student with a problem. Miss Hart knows 
that once a test begins questions are not to be encouraged. She knows, of 
course, that neither she nor her proctors may assist a student on specific 
items or proride clues as to the correctness of a pupil’s response. 

Let us return to Miss Hart and the test setting. A student has just raised 
his hand and Miss Hart has silently gestured for him to come to her desk. 

Student: I don't understand the meaning of this question. 

Miss Hart: You don’t understand the meaning of the question? 
Student: No, I don’t. 

Miss Hart: You don’t? 

Student: Well, what I mean is I don’t remember how to do this 
part of it. Can you just tell me what {student points to a section of a 
test problem) I should do here? 

The reader should note that up to this point Miss Hart has been non- 
directive, that is, she has reflected the student’s questions. To do more 
would obviously have entailed answering the test question or gjving im- 
portant cues which would have the same result. Now, however, Aliss Hart 
must be directive and make it quite clear to the student that he will have to 
work out his owm problems; 

■J Miss Hart: Jim, I am very sorry. {Miss Hart smiles scarmly.) I 

I can’t answer your question. If you arc stuck on that question go on 
j to the next item. Just do the best you can. 

3. Time limits must be strictly observed. 

“Don’t ever trust one clock,” said an experienced teacher-tester when I 
administered my first tests in the schools. They never told me that in 
graduate school nor do I recall reading it in any measurement text but let 
me pass on her advice. It really does make sense. Your watch may stop 
running but more likely you may be bolding it and inadvertently change 
the time (I have seen that happen). Or if you rely on the school clock the 
electricity could be temporarily shut off and there you are in the middle of 
an examination not knowing “the time of day.” The following are two basic 
rules that should be of help: (a) if a test has sections with short time limits. 


1 



71 


TEST ETHICS, STANDARDS, AND PROCEDURES 

five minutes or less, each examiner should have a stop cratch to ensure 
accurye timing; (b) most tests nHl only require an ordinary watch with a 
second hand. When using a watch write dotvn the time you begin and the 
time testing is to stop. (Remember to use a second time piece such as the 
school clock to insure reliatwlity.) 

4. The examiner and his assistants should check, infrequently, on 
the progress of the examinees. 

The word infrequent is used because there is often too much circulating 
around the room by proctors. In a great many cases this does not serve the 
interest of the students and tends to make them more anxious. On the other 
hand, some "circulating” is necessary. The best procedure, of course, to 
assure student compliance with the test format is to be sure all students 
understand what is expected of them and how they are to respond to the 
test items before testing begins. 

A few minutes after testing has begun the examiner and proctors should 
silently move around the room to check that students are working on the 
correct pages and that they are marking their responses in the appropriate 
place. After the proctors have completed their "rounds,” they should return 
to strategically located posts where they may be available to help individual 
students. They should not circubte again until a new test or subtest is begun. 

A final note of caution is in order. Remember it is not the duty of the 
examiner, nor is it necessarily beneficial to the examinee, to encourage or 
prompt students during the test. An exception to this rule is in the e.vamina- 
tlon of young children, where it may be necessary to encourage children in 
the first six grades to keep working or to check their work after they have 
finished. 

The in-service training program should be a yearly occurrence. In most 
schools major testing occurs in the fall. If tests are also given in the spring, 
it is not necessary to repeat both sessions. However, a refresher review is 
desirable, followed by concentration on the tests to be given in the spring. 
Only those faculty members who %rill be administering the tests need be 
im'olved. The spring session should be no longer than one class period, or 
approximately forty-five minutes. 


pjiYSiCAL SrrriNc 

Thorndike and Hagen (1969} list four desirable conditions for testing. 
They state that the subject should be "(I) physically cornfortablc and emotion- 
ally relaxed, (2) free from interruptions and distractions, (3) conveniently 
able to manipulate their test materials, and (4) sufficiently separated to 
minimize tendencies to copy from one another” (p. 542). 

Anastasi (I96S) suggests a room free from distraction, with adequate 
lighting and ventilation. Although all these recommendatton. teem 
appropriate, they really do not seem toalfeet teat scores. For evample, the 



72 


TESTING THE TEST 


author worked one summer in a test center at a major university where 
construction was going on and the temperature was over 95 degrees- After 
completing the test (a graduate admissions examination) a designated 
committee of these students approached the test director to complain about 
the hideous conditions of testing. Their complaint was so obviously justified 
that the director agreed to allow them to take the same examination in another 
setting on the following day. The new setting was in an air-conditioned 
building free of noise or other distractions. A comparison of test scores 
indicated few significant changes. In fact, two students scored several 
points lower on the second administration. 

Super, Braasch, and Shay (1947) presented a number of distractions to 
graduate-student groups during the administration of a vocational and 
scholastic aptitude test. The experimental group was subjected to trumpets 
blaring in the next room, sudden opening of the door by “irate” students 
who would then argue noisily outside the door; and a timer that went off 
five minutes early. The experimenter then told the annoyed students to go 
on for five minutes more. Guess what happened? There were no significant 
differences in test scores between the experimental group with all of the 
distractions and the control groups that completed the tests under “ideal” 
conditions. 

It is the feeling of this writer that psychological conditions are more 
important than the physical setting. Nev-ertheless, one should strive for 
optimal testing conditions. The following arc nine conditions that, although 
not always possible to achieve, are desirable. 

1. Maintain adequate lighting and ventilation (as good as those in 
the classroom setting). 

2. Tr)' to use the classroom for testing, especially with young 
children. 

3. Post signs on the doors indicating that testing is in progress. 

4. ^lake arrangements with the administration to suspend the 
class bells, fire drills, or public announcements. 

5. Sec to it that everyone has had a chance to use the bathroom. 
(This is especially important with young children.) 

6. See that each examinee has two usable pencils, and that there 
are c.xtras for those who will need more. 

7. Have desks or tables that facilitate the manipulation of test 
materials. 

separate the children so that they are not tempted to 
look at a neighbor’s paper. 

9. Have at least one proctor for every twenty-five students. 
Psychological Setting 

') psychological climate is of primary imporunce; much of it is depen- 

d , (lent on the physical conditions and the test administrator’s ability to 



TEST ETHICS, STANDARDS, AND PROCEDURES 73 

establish report. The psychological setting varies with the attitude of the 
examiner. For example, is the tscaminer a threatening or supporting figure? 
is he the kind of person students rebel against or wnt to p/ease, or are thcv 
indifferent to him? ^ 

The various methods for achieving rapport will differ somewhat according 
to the type of test and the ages and grades 0 / the students. Preschool and 
primary school children (nursery through third grade) especially need to be 
treated in a warm and friendly manner. The examiner should be rela.xed and 

cheerful” so that the children are not threatened by the test. Children at 
this level should enter testing %vitb feelings similar to those they feel when a 
new game is initiated. On the other hand, the older school child should be 
treated more realistically. That is, he should be told once again (a pretest 
orientation has already exposed him to the reasons for testing) to do his 
best and that the examination is to help him. It is always helpful to state 
that “hardly anyone ever finishes or is able to answer all the questions 
correctly.” 

The importance of the psychological setting has been demonstrated in 
numerous research studies. (See Sarason, 1950; Sacks, 1952; Wickes, 1956; 
Sarason and others, 1960.) These studies have revealed that testing must be 
interpreted in the light of the test situation and that good relations with the 
examiner produce better test results. For example, Wickes (1956) found 
that the examiner's behai-ior had a significant effect upon test scores It 
has also been shown that it is beneficial to know the children before testing 
them. Sarason and his associates (1960) theorize that some children perceive 
the “objective” examiner as a rejecting figure because of their own depen- 
dence needs. 

The preceding factors make it very difficult to suggest concrete guidelines 
for producing the right psychological test atmosphere for all chiMren and 
for all test situations. There are some general rules, however, that would be 
appropriate for most testing situations and most students. 


1. A pretest orientation for the students should be scheduled. 
This meeting should include the purposes and reasons for testing. 
AH students who will be tested should attend. 

2. The test administrator should maintain a relaxed and empathctic 
manner throughout testing. Directions should be read in a warm 
(not sugary) and clear voice. 

3. The examiner should convey bis objectivity to the examinees 


without coldness or aloofness. r . r 

4. The test administrator and his proctors should refrain from 
any autocratic or authoritarian manner. At the same time the slurlenu 
must understand that testing personnel arc in charge and certain 
prrKiedures must be carried nut in order to rafeguard 
‘ 5. Every effort should be marie to provide the best physical con 
ditions for testing. 



74 


testing the test 


Scoring Procedures 

The primarj' consideration of teachers and counselors in test scoring is 
economy of time. Few things make school personnel more hostile to 
than the laborious hand scoring of tests. Teachers and counselors should 
not have to be concerned with this clerical task.** Their time can be spent 
more profitably in other educational endeavors; moreover, there are more 
accurate and reliable methods of scoring. Almost c\'ery section of the countrj' 
has access to commercial test-scoring serx'ices, which are located at university 
test centers or private agencies. If the school cannot afford such services 
there are many other methods available. Let us briefly review some of these. 


Scoring Stencil 

The scoring stencil is a cardboard answer sheet with the correct answers 
punched out. It is applicable to tests with a multiple-choice or true-false 
item format. The stencil is placed over the answer sheet and the number 
of black pencil marks that are visible is equal to the number of correct 
answers obtained. That is, the placement of the stencil over the student’s 
answer sheet immediately reveals the number of right answers. Some sample 
items from a test that uses a scoring stencil** may be seen in Figure 1. 
This test is concerned with “word meaning”; students are requested to 
read the beginning part of each sentence and decide which word is best. 
They are then instructed to fill in the circle which has the same number as 
the one they have chosen. 

The Self-scoring Ans%\’er Sheet 

The self-scoring answer sheet is another method of scoring found in some 
tests. An example of this type may be seen in Figure 2. In this example from 
the Kuder Preference Record a pin-punch answer pad is used. The student 
is given a metal pin which he uses to punch holes in circles on the answer 
sheet. The inside of the answer booklet contains printed sets of circles. The 
student’s score is computed by tallying the number of holes punched in 
the circles. 

A 'variation of the same principle is found in the self-scoring carbon pad. 
This form requests the student to mark his resj)onses on the outside of an 
answer booklet with a pencil. The booklet is self-contained and cannot be 
opened without tearing. Squares or circles underneath the correct answers 
record the responses through the carbon backing onto the answer sheet. 
Scoring is accomplished by counting the number of marks in the squares or 
circles. 


, ***!* true, of coui^, when primary teachers might want to check 

for dmicultiea m certain tasks and gear their future teaching accordingly. 

Items may also be ans^^ered on a separate answer sheet for machine scoring. 



75 


STANP ABDS, AND PROCEDURES 

I MHedT— ga^bered for religious worship is ’ 

I * coJoay 3 CDD9rs<7atioa 1234 


2 convention 

4 comaiHee 

)3 0 0 • 0 

; U To seek is to 



j 5 find 

[ 6 aee 

7 aetUo 

S search 

« fi 7 8 

14 0 0 0 • ; 

i IS A line passing through the center ot'a circle and ■ 
1 With Its ends on the cirele is called a — • 

1 1 radius 

. 2 diamond 

3 ^iamalet 

4 diapona) 

t s 3 4 ' 

isOO #0 ' 

; 16 If you are daring but unwise, you are considered j 
to be — , 

5 focjlfcardy 

6 awkward 

7 ehameful 

S noisy 

5 S 7 8 , 

16 • 0 0 0 •' 


Figujre 1. Sample questions from Test I, Word Meaning, of the 
Stanford Achievement Test, Intermediate II, Form W. (Repro- 
duced from the Stanjori Aefutument Test, Intrrtnedialt IJ, 
Form W, (§) 1964 by Harcourt, Brace fit World, Inc. All rights 
reserved. Reprinted by permission.) 


Machine Scoring 

The scoring of a large number of answer sheets by machine is faster and 
requires less manpower than scoring by hand. There are many machine- 
scoring plans from which to choose. Figure 3 illustrates an IBiM-type 
answer sheet that has been used widely in the last twenty years. The IBM 
answer sheet may be scored by hand or by the 805 International Test 
Scoring Machine. Special electrographic pencils are needed for this answer 
sheet. Figure 4 reveals the newer MRC answer sheet, also used in machine 
scoring. The MRC answer sheet is scored orj electronic test equipment at 
Measurement Research Center, Iowa City, Iowa. Ordinary soft-lead pencils 
are used with this answer sheet. 

The school district should make its choice of scoring method on the basis 
of individual needs and financial capacity. The following ate some of the 
major plans. 

Package Plan. A school district contracts with a test publisher for tests, 
answer sheets, scoring and distribution (statistical analysis) of scores, 'Phis 
plan, though very convenient, sometimes leads schools to purchase tests 




Figure 2. Front and inside of a portion of the Pin-Punch Answer Fad of the 
Kudcr Preference Record — ^Vocational Form CH. (From Ansaer Pad for the Kuder 
Preference Record by G. Frederic Kuder. CopjTight © 1948 by G. Frederic Kuder. 
Reprinted by permission of the publisher. Science Research Associates, Inc., 
Chicago, Ifh'nois.J 


the}’ normally would not use. Important test factors, such as the suitability 
of the test for the local system, arc often overlooked because of the ad- 
ministrative ease of the plan. 

Test-Scoring Plan. The test-scoring plan leaves the school district a great 
deal more freedom to choose tests. A school makes arrangements for test 
scoring \vith a test publisher, university testing bureau, or a company such 
as Testscor whose primary business is scoring tests. It is also possible tc 
make arrangements with most of these companies to obtain a statistical 
distribution of your school’s results. 

Test-Scoring Etp/ipment. Test-scoring equipment involves renting oi 
purchasing rruchines to score tests at the local school district. Smaller schoo 
districts often join together to share the cost and use of the machines, 'rhi; 


ST ETHICS, STANDARDS, AND PROCEDURES 


77 


I I 2 3 4 

[5 6 7 8 

1 2 ' h i' 

un n 

* 5 6 7 8 

4 '. !; V 

: 1 2 3 4 

! 5" ] ^ ■; 

15 6 7 8 

j 6’: ;= i 

j 1 2 3 4 

7r 

1 5 6 7 8 

{ 8|. ;• ‘i i' 

1 I 2 3 ^ 

91: i 


12 3 4 

211 : 

5 6 7 8 

22 r i- 

12 3 4 

23 1 \ =■ 

5 6 7 8 

24 i: •; f 1 : 

12 3 4 

2511 li 

5 6 7 8 


I 2 3 I 

5 6 7 8 

12 3 4 


26 

27 

28 
29 '• 


l 2 3 4 j 

41 1; - 'i 

5 6 7 8! 

42 : I 

1 2 3 4 1 

43 ;i 

5 6 7 8 1 

44 !; ;i k 

12 3 4 

45 1 I ii 

6 6 7 8 

4611 U 1 jl 

12 3 4 

471 •• 

5 6 7 8 

48:- 1- 

12 3 4 

49 1 r - 


Figure 3. A section of an 

Ac1,i=v=mc« (.yXrlr., n/ae. !. W».ld, Inc. All 


[1 P»(I»Clt»rH MEANING ,gd>6Ao 

MIH t lOCJOOO 37A600 soAAo? 

,i66i uiAoo 2 ’?°?? %6A66 5.6Aoc 

«||g 3 . 0.00 

i... ‘“S??* Tb&666 ho?'?'? 55 III 0 

iA.o Il%6i6 ^o??? %%%%% 

?25S I'SSSo 306600 430000 ^ 

LaAAo 310000 '•^Tjt » ‘ 580000 



■ „ of a MKC anacvcr ahcc. nsrf 



78 


testing the test 


plan is the most inexpensive if the machines are kept busy. There are several 
problems, however, with this plan. First of all one must train personnel in 
the use of the machine, and secondly the problem of machine breakdown 
can cause delays and expense. In addition, one must add to the cost of the 
machine itself the expense of special help in the form of machine operators 
and clerical assistants. 

Scoring Errors 

No matter what tj’pe of scoring method is utilized one must constantly be 
alert to the possibility of error. This means that the original machine scoring 
should always be rechecked by randomly selecting and rescoring by hand 
approximately every twenty-fifth paper. iTte reader should note that “every 
twenty-fifth paper” is an arbitrary choice; every tenth or thirty-fifth paper 
will also do. Each hand-scoring operation should always be independently 
checked. This means every hand-scored answer sheet must be scored at 
two different times to assure accuracy- Ideally this involves two different 
scorers; however, this is sometimes very difficult to manage. If t^vo different 
scorers cannot be arranged then the next best thing is to rescore the papers 
on a different day. Rescoring does not mean a simple checking of counting 
and addition. It means carrjmg out all scoring operations from placing the 
key over the student’s responses to deriving his score. 

Personnel assigned to scoring tests should be well trained for the specific 
operations needed in the tests to be used. They should be cautioned to be 
espedally careful in adding part scores (that is, adding different sub sections) 
to make a total score and going from raw*^ to converted scores'^ Remember 
the old saying that “a chain is only as strong as its weakest link"; this is 
also true in testing. The most sophisticated research, test design, and other 
important test construction factors are rendered meaningless if the scorer 
makes a simple mistake in addition. 


Factors Affecting Test Performance 


Motivation 

In testing abifity the a priori assumption is alu'ays made that the person 
being examin^ is “doing his best.” If the conditions of uniformity of testing 
are to^ be maintained, e\’eiy person should be motivated and expected to 
do his best. The importance of moth’ation to the examinee’s test behavior 
has been demonstrated in a number of studies. Incentive studies offering 
rewards for submission to authori^* figures have failed to produce significant 

i« ** number of correct answers. 

Converted score is the symbolic representation of the raw score translated into 
rercentiles. staninp« 



TEST ITTHICS, STANDARDS, AND PROCEDURES 79 

score increases on ability tests compared to scores earned in a regoiar test 
setting. (For example, sec Benton. 1936.) On the other hand, when the student 
IS concerned over h.s test score, increases of scores may be seen. (See Hurloelt 
192a; Gustad, 1951; Belts, 1951; and Flanagan, 1955 for interesting in- 
vcstigations and discussions of the problem.) 

The emotional climate of the test setting and personality problems of 
students being examined may influence the motivation of some students. 
Gordon and Durea (1948), for example, administered the Stanford-Binet 
to eighth-grade students and retested these same children two weeks later 
in an atmosphere designed to lower self-esteem and produce discouragement. 
The second tests revealed significantly lower scores than another group of 
eighth-grade children who tvere retested undernormal conditions. Goldman 
(1961) states that 


... clients who see the forthcoming test as potentially useful to them 
/ but not threatening are likely to exert optimal amounts of effort. 
I ZacA of e^ort may come from various sources; In some cases it may 

g I represent lack of interest in the test and lack of any expectation that it 

has something of value to offer. In other cases, Jack of effort may 
represent just the opposite perception, that the tests are terribly 
important and even threatening. In such a case lack of effort may play 
a defensive role, permitting the individual to say affenrard, “I didn’t 
really try, because I ivasn’t interested in the results of this test, and 
they won’t affect my plans in any way” (p. 115]. 

McClelland (1 966), after investigating experimentally the role of motivation 
in achievement and test scores, reports that regardless of the innate ability 
of a person he is not going to obtain a high intelligence test score if he has 
little or no motivation to learn. Regarding intelligence test scores he states, 


There are two places where motivation enters into an intelligence 
test score: one in the accumulation of knowledge that the subject 
shows on the intelligence test or achievement test, and the other in 
the attention he gives at the time he takes the test. \Vc know that 
people who have high achievement motivation will actually do better 
in the testing situation. So there is an intertwining here of achieve- 
ment motivation and the intelligence measure [p. 537]. 


Many studies have revealed that achievement motivation is related to 
social values and child training. These in turn relate to ethnic group, 
religion, social class, and other factors. (See McClelland, Atkinson. Clark, 
Lowell, 1953; McClelland, 1955; Rosen, 1956; Winterbottom, 1958.) 
These studies focus on the importance of a child’s background in determining 
the proportion of abilities he will use in a test situation. 

An analysis of many of the investigations indicates that a family environ- 
ment which stresses early responsibility and independence of children as 



testing the test 


weU as the social values of competition and hard work leads to achievement 

Atkinson (1937), in his investigation of “risk-taking behavior, found that 
an individual who feels there is little chance for advancement may reveal 
correspondingly low levels of motivation. The implications from these 
findings, if valid, are especially relevant in testing the culturally disadvan- 
taged. That is, there is a strong possibility that disadvantaged groups may do 
poorly on tests in part because they lack the incentive to capitalize on those 
talents that they do have. . 

The student of measurement should pay close attention to the role ol 
motivation in testing. Special caution is especially indicated in interpreting 
the test scores of emotionally and culturally disadvantaged youngsters. 

On the other hand, it is also ob\ious that most American school children 
are not disad\'antaged and are generally sophisticated about tests and 
motivated to do well in academic and test situations. These children are 
easily motivated and their cooperation in testing situations is not difficult 
to obtain. Therefore, if the classroom teacher follows the instructions of the 
test manual and is psychologically supportive of her pupils, it is fairly safe 
for her to assume that most of her students will be doing their best. The 
teacher who is in a disadvantaged setting will, of course, want to stress the 
positive importance of testing and to provide optimal motivational con- 
ditions. She will also want to consider motK'ational factors in her analj’Sis 
of the test results. 

Anxiety 

Test anxiety and free-floating anxiety” are closely related to test-taking 
motw’ation. TTie highly motivated student, for example, may be so anxious 
to do well that his verj’ desire may interfere with a good score. The person 
who is tense makes errors he would not normally commit. The student of 
measurement k^o^vs from his own personal experiences and observations 
that there is a great deal of anxiety and tension during testing. 

DeLong (1955) studied the behavioral reactions of elementary students 
in a normal classroom setting and during the administration of tests. He 
reported that during testing the children reacted in an anxious and disturbed 
fashion. 

The relationship of anxiety to test performance is not all one-sided. There 
are some studies that re^■eal that a degree of anxietj' or tension enhances test 
performance. Re\’iew the following research and compare it with your own 
thinking on this subject. 

** In addition to emotional and cultural factors that may interfere with optimal 
test perform^ce in a school setting, spetia] motivational problems are encountered 
in testing prisoners or juvenile delinquents and mentally ill patients in institutional 
settings. See S«rs (1943); Rosenzweig (1949); Sarason (1954). 

** Frte-fioating anxiety is a state of general uneasiness or dread for which there is 
no objective reason. 



TEST ETHICS, STANDARDS, AND PROCEDURES 81 

A grrat many investigations into the probiem of anxiety and test nerfor- 
mancc been made by S. B. Sarason and his colleagues (Mandicr and 
f"** Mmdict, 19S2; Sarason, Mandler and CraighiU, 
in-o ‘J”'''™' '953; Sarason, Davidson, Lighthali, and Waite, 

UaS; «aitc, Sarason, Lighthali and Davidson, »SS; Sarason, Davidson, 
Lighthali, Waite and Rtiebush, 1960). They developed a questionnaire to 
ascertain test anxiety. This instrument^’ contained such questions as: 


j ^Vh^Ie taking a group intelligence test to what extent do you perspire ? 

I Do j'ou worry a lot before taking a test.’ 

I If you know that you are going to take a group intelligence test, 

! how do you feel be/orefianJ) 

1 While you are taking a lest, do you usually think you are not doing 

] ivell ? 


Sarason, Davidson, Lighthali. and Waite (1958) in one study of 600 
children in second to fifth grade found that test anxiety increased with grade 
advancement. They also found that children with the greatest anxiety tended 
to perform at a lower level. In an earlier study (Mandler and Sarason, 1952) 
similar findings were reported; however, it was noted that the lessened per- 
formance caused by high anxiety tended to be overcome with time. This 
study and a later investigation (Waite, Sarason, Lighthali, and Davidson, 
1958] led Sarason and his associates to conclude that test anxiety tended to 
Impair test performance. They also found that social class was a factor in test 
anxiety. Upper- and middle-class subjects had less test anxiety than those 
below. They interpreted this to mean that at higher social levels there is less 
famUy prcs.sure on intellectual achievement. Sarason and others (1960) found 
that with elementary school children anxiety was more pronounced in verbal 
than in nonverbal tests and w'as greater on new and different types of tests. 

Sinick (1956) feels that “a high level of an.xiety, whether existent or 
induced in iSs, generally brings about impaired performance, but occasionally 
causes improvement” (p- 317). Grooms and Endler (1960) feel that for some 
individuals anxiety is an inhibitory factor that lowers performance. 

After years of studying the problem of test anxiety Sarason and his asso- 
ciates (1960) state, ‘‘The most consistent finding in our studies is the negative 
correlation between anxiety and imelligence test scores: the higher the test 
score on anxiety, the lower the IQ” (p. 270). They further state that the 
reason is only partly because lower ability is the source of the anxiety. The 
vast majority of subjects who made up the negative correlation were within 
the average range of iatelU^nce (PO-UO) and should not hare academic 
problems. They feci that “in the caseof the intellectually average but anxious 
child the estimate of potential based on con%-cntionaI tests may contain 


It should be noted that there was more than one fonn of the basic questionnaire, 
for example, one form for children and one for older students. 


g2 TESTING THE TEST 

more error than in the case of most other intcncctually average children 

(p. 270). , , 1 f 

Goslin (1963) makes a very practical point ^vhen he says that the place or 
anxiety on test performance is difficult to ascertain because often no one is 
in a position to point to the child whose level of test anxiety is high. Those 
who have taught will readily see the truth of this statement. It is also interest- 
ing to note that Sarason’s data seem to reinforce Goslin’s statement. The 
problem is further complicated because people react differently to anxiety. 
There are certain highly anxious persons who demonstrate an exceptional 
mental alertness whereas others become intellectually frozen. 

What does all this mean to you the potential test user? It means that 
caution must be used. It is impossible to know each individual being tested 
and to know whether a little tension would be appropriate. The following 
“do’s” and “don'ts" are practically oriented to school, teacher, and child and 
are based on research findings and personal experiences. 

Do's Don’ts 

1. Do read the test manual's directions 
verbatim. 

2. Do prepare students for testing 
either individually or in groups. 

This orientation should include the 
purposes of testing along with the 
uses that will be made of the results. 

3. Do be alert to the influence of 
anxiety on test scores in individual 
interpretations. 

A final word of caution has to do with your attitude while administering 
the test. Assume a raatler-of-fact attitude. 


1. Don’t offer incentives of any kind. 

2. Don’t orient students about testing 
on the day you plan to administer 
the tests. 

3. Don’t generalize and use anxiety 
as the only reason for low test 
scores. 


Practice and Coaching 

It has been known for many years (See, for example, Casey, Davidson, 
and Harter, 1928) that IQ scores may be significantly increased if a child 
is coached on specific items he has missed and then given the same examina- 
tion again. Contemporary research (Dyer, 1953; Dempster, 1954; Longstaff, 
1954; Vernon, 1954; Wiseman, 1954; French, 1955; Lipton, 1956; French 
and Dear, 1959) reveals that coaching and practice may be of value to persons 
who have not had experience with a certain type of test or who have not had 
rwent exposure to certain subject matter of a particular examination. The 
effects of coaching and prartice must be broken down into various situations 
m order to have practical import for the test user. The following are questions 
and answers based on the extensive research findings of over forty years. 


'KT ETHICS, STANDARDS, AND PROCEDUftES gj 

1. Is coaching and practice helpful in raising a person’s test score? 
jittsvier: res, with certain qualifications. 

2. \V7iat qualifications? 

Ansicer: Qualifications dependent on specific indi\^duals and 
specific tests. 


3. What about tests used to assign students to certain schools and 
grades? 

Anstcer : British psychologists have conducted many studies in 
this area because of their concern over assignment of children to 
different types of secondary schools. They found that improvement 
of scores is dependent upon ability, the kind of tests and the type and 
amount of coaching given. 

The investigations revealed that subjects with poor educational 
backgrounds are rnore apt to benefit from intensive coaching than 
those U’ho have had excellent educational preparation. (See Wiseman 
and Wrigley, 1953.) 

4. That’s interesting, but I am especially interested in college 
entrance tests. Can you study for them? 

i^Rrtcer; The College Entrance Examination Board, noting the 
concern of parents in the United States over their children’s test 
performance, has over the years conducted a series of studies to 
determine the effects of coaching on its Scholastic Aptitude Test 
(Dyer, 1953; French, 1955; College Entrance Examination Board, 
Trusfees, 1959; French and Dear, 1959). These studies reveal slight 
increases in scores but none that are significant. 

5. What do I advise my students? 

Anstser: Advise students individually. If a student has not had 
mathematics recently or has not been exposed to testing, a review of 
the former and some practice tests may be of help. For most students, 
however, the following statement by the College Entrance Examination 
Board is appropriately* 


The trustees of the College Entrance Examination Board have noted 
with concern the increasing tendency of secondary school students to 
seek the assistance of special tutors or of special drill at school in the 
hope of improving thereby scores earned on College Board examina- 
tions. The Board has now completed four studies designed to evaluate 
the effect of special tutoring or “coaching” upon the Scholastic 
Aptitude Test, the basic test offered. Three other studies have been 
conducted independently by public high schools. These studies being 
completed, we now feel able to make the following statement; 


A Guide Jar CoiMelw Oi^rm. 

:opjTifihTt966. by the CoJJege Entrance Examination Board. Used by permission 
rf the College Entrance Examination Board. 



TESTING THE TEST 

The evidence collected leads us to conclude that intensive dnH for 
the SAT, either on its verbal or its mathematical part, is at b^t likely 
to yield insignificant increases in scores. The magnitudes of the increases 
which have been found \’ary slightly from study to study, but they are 
always small and appear to be independent of the particular memo 
of coaching used and of the level of ability of the students being 
coached. The results of the coaching studies which have thus far been 
completed indicate that average increases of less than 10 points on a 
600 point scale can be expected. It is not reasonable to believe that 
admissions decisions can be affected by such small changes in scores. 
This is especially true since the tests are merely supplementary to the 
school record and other evidence taken into account by admissions 
officers. 

The conclusion stated here has been reached slowly and with care, 
although the atmosphere in which the problem has been studied has 
not been entirely calm. In recent years newspapers and even radio 
advertisements advancing the claims of the drillmasters have increased 
in number and boldness. Parents, already disturbed by exaggerated 
notions of the difficulties of students in gaining admission to college, 
have demanded that the schools divert teaching energy and time to a 
kind of drill that is obnoxious to educators of every philosophy. 

With parental concern so great, each completed study yielding 
negative findings with regard to the usefulness of coaching has led 
only to speculation that under some other set of circumstances some 
other set of students might make important score increases as a result 
of coaching for the test. The time has come to say that we do not 
believe it. 

'I utors often show apparent good results mainly because students 
and scores do change with the passage of time. Our studies have 
simply shown that the scores of students who are left alone change in 
the same directions and to nearly the same degree as do scores of 
students who arc tutored. The public, though, is disconcerted to see 
any change in a measure of ‘'aptitude” which is regarded as un- 
changeable. As the College Board uses the term, aptitude is not some- 
thing fixed and impervious to influence by the way the child lives and 
is taught. Ilather, this particular Scholastic Aptitude Test is a measure 
of abilities that seem to grow slowly and stubbornly, profoundly in- 
fluenced by conditions at home and at school over the years, but not 
responding to hasty attempts to relive a young lifetime. 

In addition to changes due to growth, other changes occur because 
the test, while dependable, shares a characteristic common to all tests 
I in that it cannot be made to give exactly the same score for each student 

1. each time the test is taken. Changes due to this lack of complete 

1 dcpcntlability arc uncontrollable. Thus, with scores being affected by 



TEST ETHICS, STANDARDS, AND PROCEDURES 85 

I both the imperfect nature of the testing process and the student’s 
growth, about one student in four will find that his scores actually 
J decrease from one year to the next, while most other students will 
I nave sma}} to moderate increases. About one student in fifteen'^ will 

j find that his scores increase by 100 points or more between junior 

and senior years in high school, an4 this is true tahetker he is coached 
5 or not. It is not surprising then that tutors are often able to point to 
I particular students who have made large increases in their scores. 

It is possible to predict the size and number of fluctuations in scores 
that will occur within large groups, but fluctuations of individual 
scores cannot be predicted. Yet it is upon individuals that interest 
properly foctises, so that unexpected changes are easily, though erro- 
neously, attributed to coaching, to the school, or to some other visible 
agency. 

We hai’C said nothing about the tests of achievement in specific 
school subjects. These have not been studied in the same way as Has 
the aptitude test. We do know that these tests do a modest but 
useful job of measuring learning of the material tested. We suspect 
that the question of coaching for these tests is a matter of choosing 
a method of teaching the subject. We cannot believe that drill on 
sample test questions is the most productive method available. 

Finally, we worry very little when parents of comfortable means 
1 decide that at worst tutoring can do no harm and therefore use their 

; money for coaching toward College Board examinations. We are very 

concerned when parents purchase coaching they cannot afford or, 
failing to do so, feel that an unfair advantage has gone to those who 
have had a few iveeks or months of tutoring. But we are concerned 
most, and have been moved to make this statement, because we see 
the educational process unwillingly corrupted in some schools to gain 
ends which we believe to be not only unworthy but, ironically, 

I unattainable. 

6 . Can you practice for intelligence tests? 

Answer : A great many investigations reveal that, on the average, 
scores will increase upon retesting if the same examination is used. 
(See Crane and Heim, 1930; and Heim and Wallace, 1950.) U parallel 
forms'^ are used elo’ations in test scores are less pronounced. The 
author’s experience suggests that, on the whole, most children and 
adults tend to obtain the same general scores no matter hoiv much 
testing or practice they have had. 


CoJJege Entrance Examination Board in □ pwBo^ commumcauon to 
states S fifwen has been extended up to twen^- That is. "Abom one student m 

difficulty, using different questions. 



86 


testing the test 


Other Factors 

Brief statements on the following variables will be made not because they 
are unimportant, but because they are mentioned in different sections of the 
text and need not be elaborated upon at this time. 

Response Sets 

R«ponse sets are a tendency to choose a certain direction in responding to 
test items. For example, some people tend to answer no to all personal 
problems. Another type of response set is found in individuals who tend to 
guess freely, or who are afraid to guess (Goldman, 1961). 

Teachers, counselors, and ps 3 xhologists should not be particularly con- 
cerned with this factor, because it is the primary responsibility of the test 
constructor. The test user should note, however, the tendency of some tests 
to produce response sets and weigh this factor before purchasing a given test. 
(For a detailed discussion of this factor, see Cronbach, 1950, 1960; Goldman, 
1961.) 

Cheating 

The problem of cheating is as old as mankind. It is obvious that if a person 
knows the answers in advance, looks at another person’s paper, starts before 
the signal to begin or finishes after time is called, his score will not be a 
reflection of his own ability and therefore will be an ins’alid evaluation. 

One form of cheating is making oneself look “good” on interest and 
personalit}’ inventories. It has been found that most of these tests lend them- 
selves to making false responses (Cross, 1950; Gany, 1953; Gehman, 1957; 
Noll, 1951). 

The best way to avoid cheating is to be sure that you have a good test 
orientation. This will help convince students that an honest score is in their 
own interest. Another method of avoiding cheating is to be sure that the 
examiner has enough assistance to obse^^'e the students being tested. 


References 


American Educational Research Association and National Council on Measurements 
Used in Education, Committee on Test Standards. Technical recommendations 
for achievement tesU. Washington, D.C.: National Education Assodation, 1955. 
American Personnel and Guidance Association. Ethical standards. Personnel and 
Guidance "Journal, 1961 (Oct.), 206-09. 

American Psychological Association. Technical recommendations for psychological 
tests and diagnostic techniques. Washington, D.C.: American Psychological 
Assocution, 1954. 

American Psychological Assodation. Ethical standards of psychologists. American 
Psychologist, 1963, 18, 56-60. 



test Ethics, standards, and procctures gj 

"'Tnd Edraional Rcsarch Association, 

f t , m Education. SlmJardi > eiatafima! 

HMingwo, D.C.t American PsycWogical 


aWtasi, A. PsiMosica] inlKg (3rd cd.) Nc« yorit; Macmillan, t968 

19S7'H°S7r’ Prj’cUoglral 


ncMon, A. L. Influence of incentives upon intelligence test scores of school children. 

journal of Genetic Ptychalogy^ 1936, 49, 494-96. 

Casey, M, L., Davidson. H. P.. and Harter. D. I. Three studies on the effect of 
training in similar and identical material ujMsn Stanford-Blnct test scores, 
TKenty-snenth yearbook. National Soctal Studies on Education. Part I, 1928, 


Cattell. J. M. K. Mental tests and measurem^its. lilind, 1890, 15, 373-80. 

College Entrance Examination Board, Trustees. A statement by the college board 
trustees on test "coaching.*’ College Board Netes, 1959, 5, 2-3. 

Crane, V. R., and Heim, A. W. The effects of repeated retesting: HI, Further 
experiments and general conclusions. Quarterly Journal of Experimental Psycho- 
logy, 1950, 2, 182-97. 

Cronbach, L. J. Further evidence on response sets and test design. Educational and 
Psychological Measurement, 1950, 10, 3-31. 

Cronbach, L. J, Essentials of psychological testing. (2nd ed.) New York: Harper & 
Row, 1960. 

Cross, 0. H. A study of faking on the Kuder Preference Record. Educational and 
Psychological Measurement, 1950, 10,271-77. 

DeLong, A. R. Emotional effects of elementary school testing. Understanding the 
Child, 1955. 24, 103-07. 

Dempster, J, J. B. Symposium on the effects of coaching and practice in intelligence 
tests: III. Southampton investigation and procedure, British Journal of Educa- 
tional Psychology, 1954, 24, 1-4. 

Dyer, H. S. Does coaching help? College Board 1953, 19, 331-35. 

Ebbinghaus, H. Ofaer cine neue Methode zur Priifung geistiger Fahigkeiten und 
Ihrc Anwendang bei Schulkindcrn. Z Psychology, 1897, 13, 401-59. 

Eclls, K. et al. Intelligence and cultural differences. Chicago : University of Chicago 


Press, 1951. 

Flanagan, J. C. The development of an indc.x of examinee motivation. Educational 
and Psychological Measurement, 1955, 15, 144-51. 

French, J. W. An answer to lest coaching; Public school e.vperiment with SAT. 
College Board Revieie,195S,21,S-T. 

French, J. W., and Dear, R. E, Effect of coaching on an aptitude test. Educational 
and Psychological Measurement, 1959, 19, 319-30. 

Garry, R. Individual differences in ability to fake vocational interests. Journal of 
Applied Psyehology,\955, Til, yy-M. . 

Gehman, W. S. A study of tib.Iify to 64o on the Sttone Voattonvl teerat 
Blank for Men. Educational and Psychological Measurement, 1957, 1^ 65-70. 

Goldman, L. Ufih^ tests in counseling. New York: Appleton-Ccnlury-Crofis, Inc., 


Gordon, E. M., and Sarason. S. B. The relationship between "test anxiety" and 
"other anxieties." Journal of Personnel, 1955, 23, 317-23. 



gg TESTING THE TEST 

Gordon, L. V., and Durea, M. A. The effect of discouragement on the revised 
Stanford-Binet scale. Journal of Genetic Psychology, 1948, 73» 201-07. ^ 

Goslin, D. A. The search for ability: Standardized testing in social perspective. New 
York: Russell Sage Foundation, 1963. 

Grooms, R. R. and Endler, N. S. The effect of anxiety on academic achievement. 

Journal of Educational Psychology, ^ 

Gustad, J. W. Test information and learning in the counseling process. Educational 
and Psychological Measurement, 1951, 11, 788-95. 

Heim, A. W. and Wallace, J. G. The effects of repeatedly retesting the same group 
on the same intelligence test; II. High grade mental defectives. Quarterly Journal 
of Experimental Psychology, 1950, 2, 19-32. 

Hurlock, E. B. An evaluation of certain incentives used in school work. Journal of 
Educational Psychology, 1925, 16, 145-59. 

Karmel, L. J. "Secretarial-Psychologist” — A new member of the psychological 
team? Jbur«a/o/ School Psychology, 1965, 4, (3), 64-67. 

Karmel, L. J. Testing in our schools. New York: Macmillan, 1966. 

Kuhlmann, F. A. A revision of the Binet-Simon system for measuring the intelli- 
gence of children. Journal of Psycho-Asthenics, Monogram Supplement, 1912, 1, 
1 ^ 1 . 

Lipton, R. L. A study of the effect of exercise in a simple mechanical activity on 
mechanical aptitude as is measured by the subtests of the MacQuarrie Test for 
hlechanical Ability. Psychology Netcsleller, NYU, 1956, 7, 39-42. 

Longstaff, H. P. Practice effects on the Minnesota Vocational Test for Clerical 
Workers. Journal of Applied P^chology, 1954, 38, 18-20. 

Mandler, G., and Sarason, S. B, A study of anxiety and learning. Journal of 
Abnormal and Social Psychology, 1952, 47, 166-73. 

McClelland, D. C. The measurement of human motivation: An experimental 
approach. In A. Anastasi (Ed.) Testing problems in perspective. Washington, D.C.t 
American Council on Education, 1966. Pp. 528-38. 

McClelland, D. C. (Ed.) Studies in motivation. New York; Appleton-Centur)'- 
Crofts, 1955. 

McClelland, D. C., Atkinson, J. W., Clark, R. A., and Lowell, E. A. The achievement 
motive. New York: Appleton-Ccntury-Crofts, 1953. 

Noll, V. H. Simulation by college students of a prescribed pattern on a personality 
scale. Educational and Psychological Measurement, 1951, 11, 478-88. 

Rosen, B. C. The achievement syndrome. American Sociological Review, 1956, 21, 
203-11. 

Rosenzweig, S. Psychodiagnosis : An introduction to the integration of tests in dynamic 
clinical practice. New York; Grune and Stratton, 1949. 

Sacks, E. L. Intelligence scores as a function of experimentally established social 
relationships between child and examiner. Journal of Abnormal and Social 
Psychology, 1952, 47, 354-58. 

Sarason, S. B. The clinical interaction, vcith special reference to the Rorschach. New 
York: Harper & Row, 1954. 

Sarason, S. B. The test-situation and the problem of prediction. Journal of Clinical 
Psychology, 1950, 6, 387-92. 

Sarason, S. B., Davidson, K., Lighthall, F., and Waite, R. A test anxiety scale for 
children. Child Development, 1958, 29, 105-13. 


TEST ETHICS, STANDARDS, AND PROCEDURES g9 

Sarason, S. B., Davidson, K. S., UghtBall, F. F., Waite, R. R., and Ruebush B. K. 
Anxiely in elementary ickool children. New Yoci: Wiley, I960. 

Sarason, S, B., and Gordon, E. M. The test anjdety questionnaire- Scoring norms. 
Journal of Abnormal and Social Psyefuthgy, 1953, 48, 447-48. 

Sarason, S. B., and Mandler, G. Some correlates of test anxiety. Journal of Abnormal 
and Social Psychology, 1952, 47, 81(1-17, 

Sarason, S. B., Mandler, G., and Craighfll, P. G. The effect of differential instruc- 
tions on anxiety and learning. Journal of Abnormal and Social Psychology, 1952, 
47, 561-65. 

Sears, R. Motivational factors in aptitude testing. American Journal of Ortho~ 
psychiatry, 1943, 13, 468-93- 

Sinick, D. Encouragement, anxiety, and test performance. Journal of Applied 
Psychology, 1956. 40, 315-18. 

Super, D. E., Braasch, ^V. P., Jr., and Shay, J. B. The effect of distractions on test 
results. Journal of Educational Psychology, 1947, 3S, d73-77. 

Terman, L. M. The measurement of intelligence. Boston: Houghton Mifflin, 1916. 

Thorndike, R. L. Personnel selection : Test and measurement techniques. New York: 
Wiley, 1949. 

Thorndike, R. L., and Hagen, E. Measurement and evaluation in ptyehohgy and 
education (3rd ed.) New York: Wiley, 1969. 

Vernon, P. E. Symposium on the effects of coaching and practice in intelligence 
teats: V. Conclusions. British Journal of Educational Psychology, 1954, 24, 57-63. 

IVaite, R. R., Sarason, S. B., Lighthall, F. F.. and Davidson, K. S. A study of 
anxiety and learning in c\i\\6ten.yournalof Abnormal and Social Psychology, 1958, 
57, 267-70. 

Wickes, T. A., Jr. Examiner Influence in a testing situation. Journal of Consulting 
Psychology, 1956, 20, 23-26. 

Winterbottotn, M. R, The relation of need for achie\ cmcnt to learning experiences 
in independence and mastery. In J. W. Atkinson (Ed.), Motives in fantasy, 
action andsociety. Princeton, N.J.: Van Nostrand, 1958. 

Wiseman, S. Symposium on the effects of coaching and practice in mielligence 
tests: IV. The Manchester experiment. British Journal of Educational Psychology, 
1954. 24, 5-8. 

\Vi8cman, S.. and 'Yrigfey, J. The comparative effects of coachtng and practice 
on the results of verbal intelligence tests. British Journal of Pri,-chahg)', 1953, 44, 
83-94. 




Teachers, counselors, and others using tests often say, “Okay, I am con- 
\’inced testing is a good idea and a necessary part of our evaluation program. 
But how do I know if the test is any good? I am not a testing specialist! 
The^ questions are pertinent and extremely important. This copter "’ill 
be der’oted to answering these and other questions relevant to test selection. 

How to Evaluate a Test 

There are general considerations in test evaluation that are always important. 
The foremost of these is whether the test measures what it is supposed to 
measure. Next in importance is whether the test measures consistently and 
accurately. In addition to these factors one is always concerned about the 
“practical” aspects of the test. Is it convenient to use? Is it economical and 
easy to administer and interpret? %Vhat about the time factor? 

Three criteria for evaluating a test have already been mentioned. They are 
\’alidity, reliability, and practicality. Let us now turn our attention to these 
specific test selection factors. 


90 


VALIDiry AND RELIABILITY 

Validity 


91 


The most importam vsmble in judging the adequacy of a measurement 
mstrument is ns validity. Here «hc question is "What does the test measure!" 
i he test author s basic purpose in constructing the test, for e.eamplc, may 
have been to measure reading comprehension. The test buyer's concern is 
that the test does, in fact, measure reading comprehension. If it does, to what 
degree? If it does to a large extent, it is considered valid. To the degree that 
the test measures something else — for example, spelling — ^its validity as a 
measure of reading comprehension is impaired. 

A test constructed with the aim of predicting success in high school algebra 
is A^lid to the degree that those who achieve the highest scores on it also 
achieve the highest grades in algebra. A test designed to measure artistic 
aptitude is valid to the extent that it can distinguish bet^yeen those who will 
succeed and those who will fail in artistic endeavors. 

The student of measurement should remember that ^’alidlty is a matter of 
degree. A test is almost never completely valid, nor is it usually entirely 
invalid. The primary question, then, in selecting a test is how valid is it? 
Will it serve your needs ? If it docs, to what degree ? 

Tests are used for different evaluative needs, and for each need a different 
method of investigation is necessary to establish validity. Different kinds of 
tests are used for various measurement purposes. It should be noted, hon ever, 
that the purposes of many tests overlap. For example, an intelligence test’s 
major purpose may be to measure mental ability, but it also may be used for 
determining personality aberrations and brain damage. Therefore, it is 
necessary to gather different validity data within one test as well as securing 
validity information on different t>'pes of tests. In our example three different 
uses of an intelligence test were cited. Thus, validity data would be needed 
in each of the three because of their differing goals. It is important to re- 
member that the nature of the data to be secured is dependent on the objectite 
or objectives of testing rather than on the kind of test (American Psychological 
Association and others, 1966). 

The three basic types of validity wc will discuss arc those agreed upon by 
a joint committee of the American Psychological Association, American 
' Educational Research Assodation, and the National Council on Measurement 
in Education (1966). They are (I) content validity, (2) eriferion-related vah'dity, 
and (3) construct validity. National committees’ and numerous measurement 
textbooks,* in the past, have given different labels to essentially the same 

‘American Psychological Association (I9J4); Cowm'«« f” of 

the American Educational Research Association and others (1955). They it/ggested 
four tj-pc* of '-olidity information whiidi called content valtdtiy, ttmeunenS 
i-ofitiiiv bredietive talidUy, and construct talidtty. ... , 

s CronBach (I960) and others fQlIo>ved the naoonal a>mmut.»a 
othc«”Jch as Thomdihe and Hagen (1961) and NoU (1965 apphed d.fferwt 
Thorndike and Hae«n(p. 161 ), for example, used the terms ( 1 ) represent. (2) predict, 

Sd 5) signify." Noll (p. 79). on the other hand. w«ed the three types cumcular. 



^2 TESTING THE TEST 

forms of validity. It is hoped that there will now be some standardization in 
terminology with the new pronouncements by this joint committee. 


Content Validity^ 

This area of validity is especially important to teachers. The teacher \vho 
gives an examination which covers the materials and objectives of instruction 
within her classroom has probably given a test that has content validity. How 
does the teacher know that her test has content validity? Before an answer 
can be forthcoming certain questions must be asked. Consider for illustrative 
purposes a test in American history. What are the facts, skills, and concepts 
that have been stressed in the classroom? What are the curricular objectives? 
How does the content of the lest match these? Docs the test require know- 
ledge and insight beyond the scope of the course and the stated instructional 
objectives? To answer these questions one must match test content against 
course content. If the instructional goals of the course are represented in the 
test, we may say the test is valid. It sounds easy, but is it ? The analj'sis of the 
test and the course is largely a “logical” and subjective judgment. In order 
to make our assessment as “rational” as possible we must have an itemized 
list of course objectives to compare to an itemized list of test content. 

It is important to remember that content validity is especially vital for 
achievement measures as well as for tests of adjustment based on observations. 
Astandardized achievement test is judged as having content validity when its 
content represents the curricular goals of those using the test. Whether the 
test is a national standardized instrument ora local classroom test we can say it 
has content validity only after we have asked ourselves: Do the tasks of this 
test typify the educational objectives we feel are important in this area of learn- 
ing ? Are they the educational objectives that we have stressed in our classroom 
and school system? If the relationship is good, we may say the test Is valid. 

Test authors and publishers who produce tests for national use strive to 
determine the generally accepted educational aims of instruction in the area 
in which a test is to be constructed. The pronouncement on Standards* by 
the American Psychological Association and others (1966) states that the 
test manual should prove the claim that the content of the test is representative 
of the assumed educational goals, tasks, and processes. They further state, 

r In the case of an educational achievement test, the content of the 
11 test may be regarded as a definition of (or a sampling from a population 


It should be noted that both Thorndike and 
P ^ C^mmec's UbeU in their discussion of each type of 

“curricular" s-alidity and Thorndike and Haven’s 
repr«enting \ alidity are the same as content v’alidity 

"‘""■o'' (Amnican Psychological Associa- 

Bd,.co,io„ol avd 



VALIDITY AND RELIABILm’ ^3 

of) one or more educetional objectives. The aptitudes, skills, and 

tnon Wges required of the student for successful test performance 

ruust be preascly the types rrf aptitudes, skills, and fcnmi ledges that 
j the school wishes to develop in ihc students and to c\*aluatc in terms 
, of test scores fp. I3J. 

Test authors and publishers use many sources of information to establish 
the content of their instruments. Among these are (1) comemporar)- and 
widely accepted textbooks in the area to be tested; (2) experts in the subject 
field; (3) course outlines from city, county, and state school offices; (4) 
qualified teachcK who actually teach the subject to be tested; (5) educational 
specialists who train, prepare, and supcr\'ise teachers in the subject field 
under consideration. 

The test author, in constructing his test items, will draw from all the avail* 
able and rcle\-ant sources of information that relate to the subject matter of 
his test. At the national level he is aware of s'ariations in local school sj-stems. 
lie attempts, therefore, to produce a test that can be widely used. N'otc that 
the test producer aUempti to accomplish these things. He is never completely 
successful bewuse educators differ, as do .school sy.stems. This i.« why it »ras 
stated earlier that \*alidity is a matter of degree. The prospective test buyer, 
then, evaluates the content validUy in terms of how close the test items match 
his particular educational and instructional objectives. 

U'ltAT TO Look for in tjie Tist Manual 

The following guidelines should be of assistance in evaluating content 
A’alidity. Not all test manuals will include all these guidelines, hut the better 
tests and manuals will include a majority of them. 

1. Description of the subject matter cotcred and the extent of the 
sampling. For example, a manual of a text of English usage might 
describe the types of items used and ibe range of subject matter as 
well as illustrating to what tlegrcc responses to test items indicate 
accomplishments in such areas ax spelling, gramnur, punctuation and 
fo on. 

2. A short resume of the credentials of specialins who lute f^cen 
consulted to evaluate the approprutenes-x of questions and scnnng 
procedures, and a short dcscripiion of how they aftned at their 
judgments. 

3. If the test items hasc Iwrn selected by a group of expert*, the 
manual should reveal the degree of agreement among them. 

4 Statements In the manual that relate to souren of mformatwn 
should be dated. Courses of study and method* of m,tructio« chance 
and what wus a very good reflection of these yesterday may l< a poor 
shadow today. 



94 


testing the test 


Criterion-Related Validity 

Criterion-related validity is shown by comparing 
outside criterion or criteria, such as teachem grades or job ^ j 

emphasis of this form of validity is predict, on. It ,s very useful m 
and classification of individuals for admission to col ep 
the hiring of employees, and the assignment of soldiers to various m ^ 
specialtii This form of validity is established by the use of 
table or, in most cases, a correlations between the test score and ^ ^ntenon 
measure. The eriterion is considered to be a speeific measure of “ 

be tested. For example, how well do College Board Scholastic Aptitude 1 « 
seores relate to freshman grades in college. A high positive correlation 
demonstrate criterion-related validity. Let us look at some of the me 
used in establishing criterion-related validity. 


Expectancy Table 

The expectancy table is not used as much today as it has been. It is a simple 
device that can be used to communicate criterion-related validity to test 
users wth little sutistical knowledge. The expectancy table is a grid (sec 
Figure 5) containing a number of cells with test scores along the side; at the 
top are final course grades, supervisor’s rating, or any other criterion or 
success that is desired. For each person a tally is placed which shows, verticallyi 
his test score and, horizontally, his rank on the criterion. After the completion 
of the tallying, the tallies in each cell are added and recorded in the cell. The 
figures in each row of cells are added and the sum is placed at the right of 
each tow; the numbers, then, in each column are added and the sum is 
recorded at the bottom of each column. 

Table I presents the data from Fig^ure 5. The sum of each cell has been 
converted to a percentage basis by totaling the number of tallies in its row’. 
Thus, of the twenty-two cases with scores between 60 and 69 on the Sentences 
test, 23 per cent (five) earned a grade of A, 63 per cent (fourteen) earned a B, 
and 14 per cent (three) earned a C. Not one of the cases in this group received 
a grade lower than C. From this data one might state that future students m 
rhetoric who attain scores of 60 to 69 on the DAT Sentence Test will probably 
be better than average students, because only 14 per cent earned grades lower 
than A or B. Similar interpretations may be made for other scores and indi- 
riduals (Wesman, 1949). 

The chief limitation of the expectancy table is that it reveals the predictive 
value of only one predictor at a time. A great many decisions in college 
admissions and guidance are made on the basis of more than one predictor. 
The double-entry expectancy is useful when decisions are made on the basis 
of two predictors. 

* A statistical rneasure of relationslup between two or more \'ariables. (See 
5 for a thorough discussion of correlation.) 



VALIDITY AND RELIABILITY 


Grodej il>R>|«^a■tc 



■J E 2 c B . - 

T 1 t 

J 80-8P 

70-79 

iQ-69 

50-59 

40-49 

30-39 

70-29 






1 

22 

123 

22 

8 

2 

1 




^ ’ 

nil 



3 

/// 

miHt-uii 

■m- ^ 



9 

mnu 

m lit ^ 

6 

W l 


3 

III 

13 

immin 

6 

m 1 


I 

/ 

3 

m 

m nn 

3 

in 


1 

1 

im 

3 

m 



10-19 


2 

It 




0-9 


1 ' 





2 13 3? 32 \l pBo' 


Figure S. Expectancy grid showing how students’ grades In rhetoric and pre* 
vioualy earned scores on the DAT Sentences Test are laJlied in appropriate cells. 
(From A. C. Wesman, Expectancy tables: A way of interpreting test validity. Tett 
Strvicf Bulletin, No. 38. New York; The Psychological Corporation, 1949.) 


Table 2 shows the use of a doublc-entry expectancy table with 294 junior 
high school bays and girls. The Academic IVomisc Tests (AP'r) were ad- 
ministered at the begtnningof a course in sdcnce. After the completion of the 
course, the grades received tvere; 31 A’s, 65 B’s, 123 C’s, 40 D’s, and 35 E’s. 
Raw scores (number right) on the APT numerical section and API’ language 
usage section were used in constructing the table. The numbers in the cells 
reveal how many pupils in each of the two-test category groups received each 
of the five grades. For example, the number 5 at the top of the upper right- 
hand cell signifies that five pupils whose AP'r numerical score was 40 or 
higher and whose API’ language usage score was also 40 or more received A’s 
in their science course (^Vcsm3n, 1966). 

The adi-antages of the double-entry erpcctancy table arc similar to those of 
the single-entry table. Doub!e-cntr>' tables arc easy to prepare and understand 
and require little statistical knowledge. 'ITie basic ad>-antagc of the double- 
entry table is that it allows simultaneous display of relationships bctv< cen t« o 
predictors and a criterion. 


TESTING THE TEST 


Table 1 Expectancy 


Table Prepared from the Grid in Figure 5* 


Number Receiving 
Total Each Grade 

Number — 

F D C B A 


Per Cent Receiving 

Each Grade Total 

— Per Cent 

F D C B A 


14 63 23 
39 35 26 
14 59 27 
6 19 56 19 
13 50 37 


2 13 37 32 16 


• The left-hand table summarizes the frequencies as they appear in the original 
grid. The right-hand table shows these frequencies converted into per cents. (From 
A. G. Wesman, Expectancy tables— -a way of interpreting test validity. Test Service 
Bulletin, No. 38. New' York: The Psychological Corporation, 1949.) 

Other kinds of expectancy tables may also be constructed to answer such 
questions as, “How do we choose the best job applicants ?“ or, “What are the 
chances that an office worker will obtain an average rating or higher?” (For 
a more detailed analysis and application see Wesman, 1949, 1966; Anastasi, 
1968, pp. 124-127.) 

The expectancy table is especially useful in interpreting test predictions 
to teachers and school administrators. There are limitations, however. The 
primary drawback is the small number of cases used, w’hich allows less 
confidence than measures using large numbers of cases. It should be remem- 
dered that the average score of a class is a more reliable figure than any 
individual score. 

Correlation Procedures 

In the area of criterion-related validity an attempt, by empirical means, 
is made to correlatetest scores with another criterion measure, such as school 
marks or another test, administered at approximately the same time (in the 
past referred to as concurrent validi^). Correlation techniques are also used 
in measuring the relationship of test scores and a measure of performance 
or success obtained in the future. Examples of this would be correlating test 
scores with future success in school or employment. The question of what 
technique to employ is dependent on test usage; that is, it is the aim of the 
test to predict future behavior or assess present status. 



VALIDITy AND RELIABILITY 





• (From A. G. Westnan, DoubU-tntry exptcrancy t^et. Test Service DuIIetin, 
No. 56. New York: The Psychological Corporation. 1966.) 


The use of correlation procedures involves the application of statistical 
methods. Let us, then, turn our attention briefly to what correlation means 
in statistical terms. Our concern in establishing criterion-related validity is 
the relationship between a test score and an octernal criterion. ^Ve must have 
some kind of statistical gauge to ascertain the extent of this relationship. A 










98 


testing the test 


statistic called a correlation coefficient expresses the degree of relationship of 
our test score and outside criterion. 

The correlation coefficient can have values ranging f rom -h 1 .0 through zero 
to —1.0. The value +1.0 indicates that the agreement between two different 
variables, such as test scores and teacher pades, is positive. Put another way, 
we can say that the person who had the highest test score also had the highest 
grade, whereas the person with the lowest test score had the lowest grade. The 
order continues in exact relationship through the whole set of scores and 
grades. The value —1.0 indicates a perfect but negative relationship. In this 
situation the scores go in exactly the opposite direction from the grades. 
Thus, the person with the highest test score would have the lowest grade, 
second highest test score would have the second lowest grade, and so forth. 
If there is no systematic linear relationship between the test scores and grades, 
the correlation coeffident is zero. (See Chapter 5 for a more detailed statistical 
explanation.) 

The Criterto-v 

Thus far our attention has been directed to what criterion validity is and 
the various methods to establish h. It has been stated that in order to possess 
this form of \'aUdity one matches test scores with a criterion measure. Some 
may ask, “How do we know if the criterion measure is any good ? ISTiat would 
we gain if we had a positive correlation between test scores and a poor 
criterion 1" These are rele\'ant questions. One of the most difficult problems 
that face the educator and test maker is dedding on a satisfactory criterion. 
This is espedally true in fields that have many variables which impinge on 
success or failure. How does one establish, for e.xamp1e, a suitable criterion 
basis for effective teaching? Does one use supervisory* ratings? If so, what 
about their quality ? The chance of variance from supervisor to supervisor is 
great. The atmosphere of the school and the level of ability* and interest of 
the students might affect the teacher’s attitude and instruction. In the area 
of vocationaf sefection the same problems exist in defining successful per- 
formance. Conditions of ivork and managerial evaluations differ. It is obvious 
that ratings are not always consistent and that there are many* factors that 
affect performance. 

There are other criterion measures that may also be used. A college English 
profidenq* examination given to entering freshmen may be validated in terms 
of its ability to predict future test scores on a comprehensive English examina- 
tion given a year later. The criterion measure here is the comprehensive 
English examination. The most common criterion is a measure of success 
such as grades in training or educational programs. Selection tests for phy*- 
sidans, for example, may be validated against grades earned in medical school. 
This procedure is open to question because of several problems. First of all, 
grades tend to be unreliable; but even if they were not, thdr validity as a 
basis for the criterion variable would not be very high. Teachers more than 



VALIDITY AND RELIABILITy ^ 

anyone else knon- that grades have dnmbacb even in tenns of success !n the 
actual training program. 

We must face the fact that no criterion measure is ever perfect. The 
ultimate criterion is a constellation of many factors that only, pmsihly, mav 
be seen after a man has finished fits productive uork life. WTiat is professional 
success and by whose measure do we gauge it? The rwder c.in supply the 
other obvious philosophical questions and problems. 'Fhe more impartant 
thing to remember is the imperfectness of criterion measures and their 
relative N-alidity. Magnusson (1067) states. “'Hie only way to make the cri- 
terion data moic valid is to refine the analysis of the triable we wish to 
measure and as far as possible relate the criterion measurement to %\hat we 
consider to be the genuine criterion*' (p. 127). 

Validity Coefficients 

The most common method of reporting test s-alidity is by the u<c of a 
validity cocffictcnt, which reveals the correlation between the test and criterion. 
This estimation of validity by a correlation cocfhcicnt is called the cofffin'fnl 
of validity. It demonstrates the relation-ship between teat and criterion data. 
The coeffiaenl of va/idity prot idcs an overall tndex of the wfidity of the test. 

It is also more consistent and less prone to sampling error ilun the ctpeetanc)' 
table percentages because it utilizes alt the eases in the group. 

There arc no test manuals at the present time that report «lidity cocfiicients 
near or at +1.0. All fall short of perfect prediction. We would, of course, like 
to have higher coefficients; however, any positive correlation signifies that 
predictions from the lest will be better than guw*w. 

The most important thing (o remember when evaluating a validity co- 
efficient is the e:xtcnt to whicli it may allow for the impro\cmcnt of prediction 
and Judgment. If the validity cocffideni is zero, knowledge of a test score docs 
not allow us to predict a student’s score on the criterion with any acroraci" 
at all- As the correlation between test scores and the criterion measure 
increases, so does our ability to predict. Thus a person who scores m the top 
quarter on the test will probably be in the top quarter on the entenon tneai ure. 

It should be noted, howes'er, that some of our predictions will lie in error 
because some of those obtaining fcsf scores in the top quarter »jI) be w> the 
second qu.irtcr on the criterion measure while a imatlcr number will lie m 
the third quarter, and we may even find a few in the lowest quarter. The 
larger the validity coefficient, the less chance «f prethetne error. 


Stanpsro Khror of liSTi'tsTi: 

The ihrJaJ rrror oj islmul,' i. a ,tjli.tical Kthnique u«d to amnui! ht 
the mimher of intliviJojI. for ..horn .lati.licallv talcoLitra rteJ.clo.n. a.t 


.■n.. rraJ.r -ho i, .ntrtT.irJ m *'• •"* ■■ 

(I w. ri>. 1 <;*™. ‘"J I'*', rr- 



100 


testing the test 


wrong, and the magnitude by which the estimates arc in error. If. for example, 
validity is perfect, then the standard error of estimate is zero; if, on the other 
hand, validity is zero, the standard error of estimate is maximal. Thus, the 
standard error of estimate decreases as validity increases. Statements con- 
cerning “improvement over chance” refer to the extent to which the standard 
error of estimate is reduced. 

The teacher or counselor who is deciding on the selection of a test should 
always note with caution statements by test authors or publishers that their 
test is valid. It is important ahva}^ to ask yourself, “V^alid for what?” The 
sheer magnitude of a validity coefficient docs not assure validity for every 
situation or need. ^Vhat you want to know is what the test measures and what 
you want measured. 

WiiAT TO Look for in the Test Manual 

The following guidelines are presented for your assistance in reviewing 
a test manual. The fact that not all of them may be mentioned in a test 
manual should not surprise you. The better tests, however, will discuss the 
majority of the following either in the test manual or in a technical supple- 
ment.’ 


1. There should be an accurate description of all criteria measures. 
Attention should be drawn to the aspects of performance not reflected 
in the criterion. 

2. Validity of the test for each criterion about which a prediction 
is to be made should be given. 

3. Many test-criterion correlations should be reported. 

4. Time periods in the test administration and collection of criterion 
data should be stated. 

5. The criterion score should be determined independently of test 
scores. 

6. Tests that report validity for grade predictions should clearly 
state the way performance is measured. {Is it in line with your pro- 
cedures?) 

7. There should be measures of central tendency and variability for 
the validation sample. (See Chapter 5 for a discussion of these con- 
cepts.) 

8. The manual should describe variables such as sex, age, socio- 
economic status, and level of education when these factors are related 
to what is being tested. For educational tests, reference should be made 
to the nature of the community and selection policy, if any, of the 
school. 


f a’ ® and discussion of the recommendations see Standards 

tAmencan Psychological Association et al., 1966, pp. 16-23). 



101 


VALIDITY AND RELIABILITY 

Construct yalidity 

This type of valijity is ascertained by investigating what traits a test 
irreasttres that is, what the test score tells us about a person. Does it relate 
to sorne abstract construct that trill give us Insight into the person? Some 
samples of such constructs are ncurottdsm, anxiety, and intelligence. 
Construct validation requires a step-by-step accumulation of data from a 
great many different sources. It requires a combination of logical and 
empirical methods of examination. Basically, when studies of construct 
validity are made, they are instituted to check on the actual theory that is 
indigenous to the test. Thus, the investigator asks. “From this theory what 
hypotheses may he made concerning the behavior of individuals with high 
and low scores Data is then secured in order to test these hypotheses. Infer- 
ences, based on the evidence, arc then made concerning the theory's adequacy 
to account for the collected data. If the investigator finds that the theory is 
inadequate to render an explanation for the data, he wiJI (supposedly) change 
the test interpretation, restate or revise the theory, or completely refute the 
theor)'. New evidence, of course, would be needed to show construct validity 
for 3 revised interpretation (American Psychological Association et al, 1966). 

Cronbach and Mcehl (1966) discuss, at great length, the complexities of 
construct validity and recommend its investigation "whenever no criterion 
or universe of content is accepted as entirely adequate to define the quality 
to be measured" (p. 70). They go on to state that the determination of 
psychological constructs that account for test performance is a good practice 
for almost any test. 

Construct validity because of its broad range of meanings and use can 
create some misunderstandings. Cronbach and Meehl (1966) in their 
discussion of construct validity state. 


A construct is some postulated attribute of people, assumed to be 
reflected in test performance. In test validation the attribute about 
which we make statements in interpreting a test is a construct. We 
expect a person at any time to possess or not possess a qualitative 
attribute (amnesia) or structure, or to possess some degree of a quan- 
titative attribute (cheerfulness). A construct has certain associated 
meanings carried in statements of this general character: Persons 
who possess this attribute will in situation X, act in manner Y (with 
a stated probability). The logic of construct validation is invoked 
whether loose, used in ramified theory or a few simple propositions, 
used in absolute propositions or probability statements [p. 71J. 


The knowledge that the investigator has concerning content and crUerion- 
related validity would be used in analysing construct validity, yorjxzmph. 
criterion-related validity in a college admissions test may be established by 
correlating test scores with college grades, but the selection of grades as 
the criterion may have come about through a consideration of what constructs 



102 


TESTING THE TEST 


are most likely to provide a base for devising a good selection test. Further- 
more, a validity coefficient revealing a relationship between the test and 
grades (or any other criterion) gives us no meaningful information about the 
reason or reasons for the extent of the correlation. In order to be meaningful 
it must be grounded in the context of some theoretical proposition. Thus 
construct validity is commonly investigated when we wish to increase our 
knowledge of the qualities that the test is measuring. 

Construct validation is useful and important at times for every kind of 
psychological test. For example, the degree lo which a certain intelligence 
test is free of cultural bias would be a task for construct validation. The 
following are some techniques and procedures used to determine construct 
validity. (For a more complete account see Cronbach and Meehl, 1966; 
Anastasi, 1966, 1968; American Psychological Association, 1966.) 

1. Correlations vnth other tests. The newly constructed test is 
correlated with established tests that are already accepted measures 
of the quality or trait being examined. The Stanford-Binet, for 
example, has ser\'cd as a criterion for validation of group intelligence 
tests for many years. (It is also used in criterion-related validity.) 
The construct to be measured is mtelligence. The assumption is 
that the Stanford-Binet measures intelligence; therefore, a high 
correlation between the new test and the Binet means the new test 
also measures intelligence. 

2. Factor analysis. This statistical procedure is of particular 
importance to construct validity. Basically, factor analysis is a technique 
used for analyaing the interrelationships of psychological data. Its 
major purpose is to simplify behavioral description by the reduction 
of the number of categories from a starting multiplicity of measure- 
ment (test) variables to a few traits. After these traits have been 
identified, they may be used to describe the factorial composition of a 
test. Thus a test may be identified in terms of the major factors deter- 
mining its scores as well as the weight of each factor (Anastasi, 1968). 

3. Experimentally induced effects. In order to discover how a test 
would respond to changes in external conditions experimentally 
induced variables are presented. A test, for example, of anxiety could 
be administered to an individual under conditions of stress. The 
anxiety test scores could then be correlated with physiological and 
other gauges of anxiety during and after testing. 

What to Look for in the Test Manual 

The following guidelines, as in the cases of content and criterion-related 
validity, are based on the Standards for Educational and Psychological Tests 
and Manuals (American Psychological Association et ak, 1966).8 

• For a complete account and dbctission sec pp. 23-24 of the Standards. 



VALIDITY AND RELIABILITY 

103 

1. If the t«t is to measure a theoretical variable such as ereatiritv 
or anxtety, the proposed interpretation should be stated clearly and 
completely. That ts a definition of the construct to be meaLred 
should be given. Thus one might say “creativity" is that ability or 
craft that leacis to original contributions. 

2. The manual should signify the degree to which the proposed 
interpretation has been proved. 

3. Evidence concerning the effect of speed on test scores and on 
their relationship with other variables should be stated. 


A Last Word on Validity 

The reader should bear in mind that the three aspects of validity— cow/e;;/, 
criterion-relaled. and construct validity— are only conceptually independent. 
It is very rare that only one of them is important in a specific situation. In 
most oases s compreAensiVe anrf thorough study of a test wouW involve 
data on all types of validity. 

Remember that statements in the lest manual concerning validity should 
be specific and focused on the types of validity for the kinds of interpretations 
to be made. No lest is valid for all purposes, situations, or individuals. The 
intended use of the test is the determining factor in the kind of evidence that 
is needed. Let us briefly examine each utilizing the basic kind of validity 
evidence needed as well as its overlap with other types of validity. 

1. Content validity. Content validity is of primary importance in 
achievement tests. A test publisher consults with a group of subject- 
matter experts who help devise and arrange test items they feel cover 
the topics pertinent to the area represented by the test (content 
validity). Criterion-related validity is also necessary to chedc against 
a later criterion of performance. An achievement test may be used for 
a selection program. A theoretical analysis of what is being measured 
by the achievement test requires a consideration of nnstmet validity. 
Is, for example, a score on a mathematics test reflective of mathe- 
matical ability, understanding, or memorization of data ? 

2. Criierion-Telated validity. Criterion-related validity is of primary 
importance in intelligence or scholastic aptitude tests to reveal the 
ability to predict school or college success. The kind of aptitudes 
measured is evaluated, many times, by the content of the test items 
and correlations with other tests. 

3. Construct validity. Construct validity is of primary importance 
in personality tests, especially where project!^ techniquesf are used. 

If a diagnosis is to be made, other criteria such as psychiatric opinion 
{criterion-related validity) are used at the time of testing or afterward. 

»Sec Chapter 12 for a discussion of projective techniques used in personality 
testing. 



104 


TESTING THE TEST 


It is obvious from our discussion that validity is a broad term encompassing 
many different factors. Our first question— “What does the test measure?”— 
is the one whose answer the classroom teacher, test author, and publisher 
needs to know. After this question has been answered, our next inquiry 
concerns the accuracy of the test. Let us. then, turn our attention to reliability. 


Reliability 

The second question we ask about a test is, “What is its reJiabiJity?” 
Our question is not concerned with what the test measures, but how con- 
sistently it measures whatever it does measure. What is the stability of the 
test score? If we measure the same person again, how consistent wall the 
test scores be? 

The reliability of a test refers to that quality of a test which demonstrates 
test score consistency and stability. Thus, when Mrs. Gold’s eighth grade 
takes the Jones Test of School Ability on two different occasions, are their 
scores approximately the same or have they changed? If they are approxi- 
mately the same, we may say the test is reliable. Henry received a score of 
60 the first time he took the test and a score of 62 the second time. His 
scores are eonsistetit. On the second administration of the same test his class, 
on the average, received approximately the same scores as they did on the 
first administration. The test seems to he reliable in that there was con- 
sistenq' in the results obtained when testing was repeated on the same 
students. A lack of consistenq* would have been e\-ident if the students in 
Mrs. Gold’s class had not obtained similar scores or held the same relative 
test score positions. The determination of reliability on standardized tests, 
of course, involves many more classes ^nd individuals, but the principle 
remains the same. 

Reliability is a general term referring to many different types of e\'idence. 
Each kind of e^^dence suggests the consistency to be expected among 
similar obser\'ations. Specific types of errors or inconsistencies are explained 
by different kinds of e%ddence. There is no single measure of test reliability' 
that is always preferable. The choice is dependent upon the intended 
utilization of the test scores. Although there arc ^'a^^ous methods of estimating 
the reliability of psychological and educational tests, the most commonly 
used are based upon two measurements of the same subjects. The U\o 
measurements rrtay be obtained by three different techniques: 

1. Retesting subjects with the same test. 

2. Alternate form of the original test, that is, correlation of original 
test scor» with scores on another independent test (different form) 
with an item content similar to the original test. 

3. SpUUhalJ, or odd-even, «>rrelation, which involves a di^- 
sion of the test into two parts, one part being the odd-numbered 



VALIDITY AND RELIABILITY 


105 

questions and the other being the even-numbered questions. The 
correJatJon betu-ecn scores on the odd-numbered and the even- 
numbered Items yields a reliability coefficient for the entire test. 

It is apparent from our brief discussion so far that different methods of 
obtaining reliabdity take into consideration different sources of error. There 
are various factors that contribute to “unreJiabllity" or inconsistency. They 
include (a) differences in the condition of the individual being tested— for 
e.KampIe, mood, physical state, and so on; (b) differences in the test content 
or test situation; (c) variations in test administration, such as noise or 
differences in the administrative skill of the tester ; (d) mistakes and differences 
in scoring and recording scores as well as variations in the process of observa- 
tion (American Psychological Association ct. aL, 1966). 

Up to this point we have discussed what reliability is and some of the 
general considerations in estimating its presence. Let us now return to the 
three specific techniques mentioned earlier; (1) retesting, (2) alternate 
forms, and (3) “odd-even.” 


Jietfsttng 

Testing individuals with the same lest they have taken earlier is known as 
retesting. If a physician, for example, wanted to check on the accuracy of 
his nurse’s ability to measure his patients’ weight and height, he might ask 
her to measure each patient tivice using the same procedures. An even better 
technique would be to have someone else do the second measuring, so that 
the nurse’s recall of the first measurements would not influence the second 
ratings. The physician of course might also want to know the exact weight 
and height of a patient each day and how the measures vary. These two 
instances provide us with two separate but related investigations, measures 
of (1) individual variance and of (2) variation caused by the procedure of 


measurement. 

The reliability of measurement in height and weight assessments is less 
complex than it is in standardized testing. However, the same principle is 
involved. Let us take as an example a test of English and the procedures for 
finding reliability under the retesting technique. The English test is ad- 
ministered to a class and is immediately readministcred. Measurement here 
is contaminated because children are able to remember questions and do 
not spend as much time with the second test. The children who were not 
able to finish the first time wll certainly be in a better position to complete 


the test the second time. 

It is important not only to determine the degree of variation of individual 
response from one occasion to the next, but also to know the extent of 
sarnpIinE variance involved in deciding on a given set of items. That .s, there 
is no reason to think that one set of fifty English usage items is superior or 
inferior to another equivalent set of fifty. Suppose that one set of questions 



106 


testing the test 


deals with a unit recently covered by some of the children being examined. 
These items would be especially easy for them. The test might then over- 
estimate their level of English usage. It would do so consistently on both 
testings because the items would remain the same. A given set of test items 
is not equally valid or reliable. The point to remember, then, is that although 
the retesting method of determining reliability provides data regarding^ a 
particular set of items used, it is possible to obtain a very different reliability 
estimate if another set of items is used. 

Retesting with an identical test may account for errors in answer differences 
to a test at a specific moment and variation in individuals from time to time. 
It cannot rule out, however, the variation arising out of the specific set of 
items chosen.^® 

Standards (American Psychological Association et ah, 1966) is quite clear 
on this point. 

^ Aside from practical limitations, retesting is not a theoretically 
I desirable method of determining a reliability coefficient if, as usual, 

I the items that constitute the test are only one of many sets (actual or 

1 hypothetical) that might equally well have been used to measure the 
] particular ability or trait [p. 25]. 


Alternate Forms 

The alternate forms measurement of reliability attempts to establish 
reliability by correlating scores, obtained by the same individuals, on two 
different forms of the same test. These alternate or equivalent forms are 
constructed with the same basic purposes in mind. They contain items of 
similar difficulty and cover the same areas of knowledge or skills even though 
they use different questions. Individuals may be tested with one form 
initially and then retested with the other form. The resulting correlation 
between the scores on the two forms is the reliability coefficient. This type 
of coefficient represents two aspects of test reliability — time stability and 
response consistency to different samples of items. 

Thus, alternate or equivalent test forms are variations on the same test 
theme. They are individually constructed tests created to meet the same 
specifications. Each form should contain the same number of items covering 
the same kind of content and arranged in the same format. All aspects of 
the test— including the degree of content difficulty, instructions, time limits, 
and so forth must be comparable- Thus, two equivalent intelligence tests 
should contain items and questions of the same difficulty and should cover 
the same kinds of areas, for example, numerical ability, abstract reasoning, 
and vocabulary. 


‘“Thoush the retest technique ts not appropriate for most psychological tests, 
there are some motor and personality tests that are not greatly influenced by repetition 
and arc amenable to the retest method (Lindeman, 1967). 


VALIDITY AND RELUDILITY 

Once \vc have established the equivalence of our alternate forms we may 
administer them either ‘‘back*to-back’' (that is, wth the second form 
immediately following the completion of the first, if we are not concerned 
with time stability) or separated by a time interval, if time is a consideration. 

Alternate form reliability, Jike any other technique, is not free of problems. 
There is the problem of practice effect, as in retest reliability. Although the 
use of equivalent forms will reduce the effect of practice, it will not elimi- 
nate it. Another limitation is the real difference between the forms; the 
concern here is ivith the degree of difference between test items. 

Given the JJmitations of the alternate form technique it is still the most 
appropriate method for most educational tesK. It is, therefore, recommended 
that the teacher or counselor give it the greatest amount of weight in in- 
vestigating the reliability of a test. 

One final note of practicality is in order. The administration of a second 
form of a test is e.\'pensive and time-consuming; therefore, many test 
authors and publishers have resorted to other devices. Sometimes they are 
satisfactory, but more often they are poor compromises. 

and Reliability 

If it is not practical to do testing on two different occasions or if it is desired 
only to sample the content consistency without taking into account individual 
response variation from one time to another, the “odd-even” or “split-half" 
technique may be used. A test of 100 items may be divided into two sets of 
fifty items each. One set contains all the e\-en-numbered items, whereas the 
other set contains all the odd-numbered items. The relationship between 
scores on the even and odd sets is an “odd-even," or “split-half," correlation. 
The correlation coefficient for the whole test of lOO items may then be 
estimated by the Spearman- Brown formula. This formula thus makes it 
possible to obtain an estimate of reliability from one administration of one 
test. . . „ 

The correlation obtained by comparing the odd-even test items is actually 
the correlation betw’een tw’o tests each of which is one half the length of the 
original test. At this juncture a correction is made by using the Spearman- 
Brovvm formula. On our lOO-itcm test, a coefficient of 0.70 was obtained 
from the odd-even method. The formula is as follows: 

2 (reliability of i test) 

Rdiabilily of ootito test = , +(„ii.biI.tyofitest) 

The actual process using our obtained coefficient of 0.70 derived from our 
correlating the fifty odd items with the fifty even items is: 

2(0,70) 1.40 

Reliability of entire test “yTCOW) “ "hTO " 

The correlation coefficient of 0.82, then, presents us ivith an estimate of 



TESTING THE TEST 


108 

reliability of an entire test where the half tests provided us with a correlation 

of 0.70. , , , , 

The ease and convenience of the “split-half method has led some test 
authors and publishers to use it when more appropriate techniques, such as 
the alternate form method, are indicated. Some cautions in the use of the 
“split-half” technique are indicated. First of all, the variation of an individual 
from day to day is not recorded in this type of estimated reliability. Secondly, 
it should not be used with a speed test, which is an examination made up of 
relatively easy items that most individuals, if given enough time, will answer 
correctly. The objective in many speed tests is to sec how many items can 
be responded to correctly in the indicated time. In computing “odd-even” 
scores on a speed test, the two scores tend to be similar and the reliability 
coefficient may be close to -!-l, or perfect. For illustrative purposes, let us 
say a 100-item test depends entirely on speed, so that individual differences 
in score rest completely upon number of items tried, rather than upon errors. 
If Robert has a score of 84, he will have forty-two correct odd items and 
forty-two correct even items; if Jim obtains a score of 64, he will hav'e 
thirty-two odd and thirty-two even correct. Thus with the exception of 
accidental errors on a few questions, the correlation would be perfect 
(-41.00). 

Most of our tests are not speed tests — and though they may be timed, 
the results will generally not be as severely affected when the “split-half” 
technique is used with them. The important thing to remember is that the 
“split-half” procedure is based upon the consistency in the number of 
errors made by the individual. If individual differences in test scores are 
significantly affected by speed, a single-trial reliability coefficient is not an 
appropriate measure (Anastast, 1968). 

Before leaving this area of reliability, one other method of estimating 
internal consistenc)’ should be noted. It is the formula developed by Kuder 
and Richardson (1937). It does not require the division of the test into halves 
and rescoring and calculating a correlation coefficient. This formula is based 
on the assumption that every item in a test measures the same general 
factors as do the others. This procedure leads to a reliability coefficient that 
may be interpreted in the same manner as the “odd-even” coefficient. This 
formula has drawbacks similar to those of the Spearman-Brown formula, 
mentioned in our previous discussion, in that (1) it is not appropriate for 
speed tests and (2) it does not measure individual variance from one time 
to another.ti 


“ The formula most commonly used is called the Kuder-Richardson “formula 20.” 
It IS stated: 


“ 7^”/- coinpjete discussion and stattstical treatment see Cronbach, I960: Xun- 
nally, 196; ; and Majmi^son, 1967. For those with a good mathematical background, 
l^rd and Nosick (1968) is an excellent adsanced treatment of tests and statisticai 
theones. 



109 


VAtlDITY AND RELIABILITV 
Standard Error of Measurement 

fn our discussion of testing thus far we have seen some of the factors that 
affect the accuracy of a test score. To state, then, that no test is perfectly 
accurate should not surprise the reader, nor should anyoneat this point thint 
that a person’s t^t score is determined only by his ability or knowledge. 
It is true, of course, that usually a person is the primary determiner of his 
own score, but the score is also a reflection of the inaccuracy of the test itself. 
A statistical technique which accounts for this test error and allows us to 
estimate the margin of error in the test score is the standard error of measure- 
ment. It is especially useful, in the interpretation of individual scores, when 
attempting to estimate the expected degree of variation in a student’s test 
score. If Mary, for example, obtains an IQ of 116, how much confidence can 
we place in this score? Will she obtain an IQ of 128 next testing session or 
an IQ of 10+? 

It is true that the reliability coefficient gives an estimate of accuracy, 
but it does not assist specifically in interpreting individual scores. The 
numerical value of the coefficient is dependent to a great degree on the range 
of scores in the group being examined. ITiat is, if a group has a small spread 
in the ability being measured, the coefficient will be low, whereas if the group 
has a large spread in a particular field, it will be higher. The standard error 
of measurement does not have these difficulties. 

Let U9 look at an actual school situation where knowledge of the standard 
error of measurement would be used. An intelligence test has been ad- 
ministered to the ninth-grade classes at Stevens Junior High School. Mrs. 
Olson, a ninth-grade teacher, is asking the counselor about one of her 
students. 


Mrs. Olson: How did Donald Smith do on the test.^ 

Counselor: Let’s see. His IQ score is around IIO. 

Mrs. Olson: What do you mean “around IIO”? fs it IIO or is 
it not? 

Counselor: The reason I say "around HO" is that no score is 
perfectly accurate, only an estimation. His actual obtained IQ score 
was 1 W, but the standard error of measurement is 6. 

Mrs. Olson ; What is the standard error of measurement ? 

Counselor: It is a statistic which describes reliability; that is, it 
tells us about the accuracy of an tndividual test score. In Donald’s 
case, for example, we know his obtained IQ score was HO and that 
the test has a standard error of six points. 

Mrs. Olson: I know you said that before, but uhat about the six 

^^Counselor: It isn’t complicated. Donald’s score is HO, the standard 
error is 6- therefore Donald’s “true" score lies somewhere between 
I(M and 116. You see the six points indicate the ^ount of test error 
that must be considered in the int«preWt.on of Donald s score. 

Mrs. Olson; I see, but w-hat is a “true 


'* score? 



110 


testing the test 


1 Counselor: A true score is one that Donald would obtain if the tet 
was perfectly reliable. The six points provides the limits within 
i which we may expect to find Donald Smith’s “true IQ score. If 

i Donald were tested over and over again in the same exact situation, 

I 68 per cent of his scores would fall wathin one standard error of his 

j “true” score, 95 per cent would fall within two standard errors, and 

I 99 per cent would fall within three standard errors, 
j Mrs. Olson; Hold on, I am getting a little confused. \\’hat is this 
i “standard error”? 

\ Counselor: Remember the normal curv'e and how, for example, 

i 6S per cent of the cases fall within —1 and --1 standard deviations 
I from the mean? 

• Mrs. Olson: Yes, I do. Do you mean those percentages are based 
! on the normal cun.’e? [Soie: A dUemsion of tlu normal curve and 

1 standard deviation can be found in Chapter 5.] 

i Counselor: Precisely. The normal cuix'e is our basic frame of 
] reference. In more concrete terms, then, we know that your student, 
j Donald Smith, has obtained an actual score of 110 and that one 

' standard error would make his true score somewhere between 104 

I and 116. 

1 Mrs. Olson: Hold on. Let me see if I can figure out the rest. The 
I standard is six points. Therefore, one standard error is from 104 

I to 116 because you added 6 to 110 at the upper range and subtracted 
i 6 from 110 at the lower range. 

j Counselor: Right. I do the same with two standard errors by 
I doubling the 6 on both sides and 1 have a range of 98 to 1?-^-, 

I Mrs. Olson: And three standard errors would be a tripling of the 
' standard error of six points. Donald's “true” score then, 99 out of 

1 100 times or three standard errors, ivould He between 92 and 128. 

I Counselor: Exactly. Now do you see why I said around 110? 

I Mrs. Olson: I certainly do. Thanks for the information. I’ll never 

j again think that someone who has an IQ of 116 is much smarter than 
! someone with an IQ of 110. 

I Counselor: At least you won’t on this test. Remember, knowing 

I the standard error of measurement for each test is very important 
y before you make an interpretation of test scores. 

Now' that Mrs. Olson knows something about standard error of measure- 
ment she would not be shocked to find that on another testing her student 
may obtain an IQ of as high as 122. The standard error should make us 
cautious in attaching meaning to minor ele\’ations or depressions in test 
scores. Remember that in actual practice we do not know an indiiidual’s 
“true” IQ; we know only the IQ obtained on one test. 

The standard error of measurement is one of the reasons that one test 
score should ne\'er be thought of as a fixed number but as a score tcithin 



VALIDITY AND RELIABILITY 

W where the true score lies. A large standard error means the band 
1 '« ™>ld hive less confidence in our obtained score 

than .f the standard error were smaller. For eramplc, what if Mrs. Olson's 
^udent obtained an IQ of 1 10 on a lest with a standard error of ten poimsf 
Ihc baml \\herc hjs “true*’ score might be (one standard error) uould be 
between 100 and 120. How comfortabte can the teacher or counselor feel 
with a band that reveals there is a good chance that the student’s true IQ is 
average (100) or superior (120)? If we carry it out to two standard errors, 
the band where the true score might be found is increased from the lower 
limits of average (90) to the very superior range (130). If, on the other hand, 
the standard error is two points, the teacher can fed fairly comfortable 
(carried out to tu-o standard errors) that her student’s obtained IQ of IIO 
will fall on cither side of his true IQ by four points, or from 106 to IH. 
'rhis range would indicate solid a^'erage ability. 

Table 3 presents the relationship between the reliability coefficient and 
the standard error of measurement. The standard errors of measurement 
for different reliability coefficients and standard deviations'^ can also be seen 
in Table 3. Note that as the reliability coefficient increases, the standard 
error decreases. It is obvious then that the higher the reliability, the smaller 
the error in individual rest scores. 

A great many test manuals report both the reliability coefficient and 
standard errors of measurement. If the standard error of measurement is 
not given, Table 3 would enable you to make an approximation. 

To obtain the standard error for a test from Table 3, note the reported 
reliability coefficient and standard deviation as given in the test manual and 
match them with or near the coefficients and standard deviation (SD) in 
the tabic. Let us suppose, for example, that you find that the Browm Test of 
School Ability has a reliability coefficient of 0.87 and a SD of 12. The first 
thing you u-ould do is find the appropriate coefficient, which is 0.85 (because 
it is nearest to 0.87), and then go down the row (O.SS) until you are next to 
the SD row of 12. The number 4.6 (under the 0.85 column and directly 
across from the number 12 under SD) is the approximate standard error of 
measurement. 

In closing, it should be restated that the standard error of measurement 
and the reliability coefficient are both methodsof demonstratingiest refiabifity. 
The standard error is not directly comparable from one test to another and 
is independent of the variance of the group on which it is computed.'^ 


IS Standard deviation is a statistical term referring to score variance, that is, it is 
a measure of the distribution of scores. See Chapter 5 for a more complete discussion 

M,",S.'s‘L."a56n; SSfw KlL1l96^N»n„Aly (1967); CoPh™ (««)j 
For"^ mile ksi statistol •nil mo" disewBioo of this subj, .t see Doppelt 

(1956)5 and Cronbach (I960). 





VALIDITY AND RELIABILITY 

rcliah.l.ty of a test in a specific situation, tve also know the extent of raliditv 
in that situation. That is, the boundaries beyond which validity cannot rise^ 
In addition we know from our previous discussion that reliability helps us 
to know the band of test score error and exactly how much weight we may 
give to an indmdual score. A comment In this general area from the 
Slmdards (American Psychological Association ct ah, 1966) seems appropriate 
before we launch into our discussion of specific factors that influence 
reliability. 

Reliability is a necessary but not a sufficient condition of \’alidi{y. 
Reliability coefficients are p^tinent to validity in the negati^’e sense 
that unreliable scores cannot be v-alid. But reliable scores are by no 
means ispo fncio valid, since vab'dity depends on what interpretation 
IS proposed. Reliability is of special importance in support of, but not 
in replacement of, the analysis and estimation of content, criterion- 
related, and construct v-alidity [p. 29}. 

Factors AmcrriinG RatABttiTv 

Let us turn our attention to four specific influences on test reliability. 

1. Length of tezt. In our discussion of the Spearman-Brown formula it 
was seen that the length of a test may affect the reliability coefficient. The 
chance of measurement errors decreases proportionardy n-irh the length of the 
test, That ts, the longer the test, the greater the chance that the score is a 
reflection of the person being tested and that it is a more accurate estimation 
of his ability, achievement, or any other characteristic being measured. 
This is logically true because wt have increased the number of samplings 
of the characteristic we wish to measure. If, for example, in an American 
history course you administer a test consisting of one essay question con- 
cerning the Civil War period, how reliable will your results be? The students 
who happened to know that particular area would get a perfect score, 
whereas those who were weak in that area would get zero. Let us suppose 
you increased the number of questions to Atc, could you then fee! more 
comfortable in evaluating your students' knowledge of the Civil War? 
Undoubtedly you would say yes, but with the ^cscr^’ation that even more 
questions — say, ten or fifteen more, or 100 multiple-choice — would be an 
even better device. Thus, by increasing the size of our sample and thereby 
lengthening the test we increase the reliability of our instrument. 

Of course there are practical limitations to increasing the length of a test. 
Factors such as time available for testing, number of good questions one is 
able to write, and student fatigue all limit the length of the test. If you must 
have short tests then a more frequent testing schedule would provide a 
greater sampling of what you want to measure and would consequently be 
more reliable. In interpreting standardized test results, be wary of subtest 
scores based on relatively few items. If no reliabiUty data are given for them, 
the best thing is to use only the total score or scores. 



TESTING THE TEST 


114 

2, Range of talent. The reliability coefficient, as we have stated before, 
\’aries with the extent of talent in a group even though the stability of measure- 
ment is not affected. A wide range of talent yields high coefficients, whereas 
a small range produces low coefficients. Thus, to interpret a coefficient 
properly a measure of the variability of the group is needed. 

Table 4 illustrates this range effect or spread of scores on two forms of an 
arithmetic test administered to twenty students. Note that changes in rank 
from one form to the other are rather insignificant. These data would 
produce a fairly high coefficient. 

Table 4 Raw Scores at>d Ranks of Students on Two 
Forms of an Arithmetic Test* 


Form X Form Y 


Student 


tuurrii 

Score 

Rank 

Score 

Rank 

A 

90 

1 

88 

2 

B 

87 

2 

89 

1 

C 

83 

3 

76 

5 

D 

78 

4 

77 

4 

E 

72 

5 

80 

3 

F 

70 

6 

65 

7 

G 

68 

7 

64 

8 

H 

65 

8 

bl 

6 

1 

60 

9 

53 

10 

J 

54 

10 

57 

9 

K 

51 

11 

49 

11 

L 

47 

12 

45 

14 

M 

46 

13 

48 

12 

N 

43 

14 

47 

13 

0 

39 

15 

44 

15 

P 

38 

16 

42 

16 

Q 

32 

17 

39 

17 

R 

30 

18 

34 

20 

S 

29 

19 

37 

18 

T 

25 

20 

36 

19 


Test Scrtice Bulletin, No. 44. 

New lork. The Psychological Corporation, 1952. 


Houever, if ttc examine only the five highest students and their ranks, 
the importance of the changes becomes greater. Student C's change in rank 
from thud to fifth, in the larger group, represents only a 10 per cent shift 
(two places out of twenty). The same shift, in the smaller group, is a 40 per 
cent change (two places out of five). If we use the twenty on which we 



VALIDmr AND RELIABILITY 

ralcuhte the reliability of the teat, it is evident that going from third on form 
X to fifth on form^ sull leaves tbestudcnt in the top part of the distribution 
On the other hand, if the estimation of reliability is only on the group of the 
top five students, this change from third to fifth means a drop from the 
middle to the bottom of this distribution. If we based our coefficient on these 
five cases it would be very low. Again, it should be noted that it is not the 
smaller group which brings about a lower coefficient, but the narrow range 
of talent. If you take five other cases such as A, E, J, O, and T— who rank 
from firet to tiventieth, a coefficient as great as that based on all twenty 
students would be produced (Wesman, 1952). 

This example illustrates why the reliability coefficient may vary although 
the test items and the students’ performances are unchanged. Remember 
that you weed /o have injormalion on the range of ability hi the tested group 
before you may correctly interpret the reliability coefficient of the test. 

3. Ability level of the group. The ability level of the group is a factor 
similar to the one just discussed. When you interpret a student’s test score, 
remember that the most meaningful reliability coefficient is one which rests 
on the reference group that is comparable to that of the student. It is, of 
course, impossible for a test manual to present reliability for all possible 
group memberships. 

The appropriate comparison group is based on ^vh3t we want to know. 
If we are testing ninth-grade boys for mechanical aptitude, we should have 
reliability coefficients based on the scores of ninth-grade boys. The co- 
efficients are less meaningful when they are based on “all high school” boys 
taking the test. They become even less pertinent when the coefficient rests 
on all high school and college students talcing the test. The coefficient of 
reliability becomes increasingly meaningful the closer we can come in 
comparing the group we want to know about %vith the original group on 
which the coefficient was based. 

4. Method used. It is very important to consider the procedures used in 
obtaining the reliability coefficient when comparing two different tests. 
The size of the reliability coefficient is related to the methods used. Different 
procedures treat various sources of ^-ariance differently. 

It cannot be said that because procedure A obtains a higher coefficient 
that it is better than procedure B, which yields a lower reliability coefficient. 
For example, the “split-half" operation usually produces the highest reliability 
coefficient. We know that it is not the best technique and that speed may 
unduly influence the value of the coefficient it produces. (See the previous 
section on “odd-even” and “split-half” procedures.) On the other hand, the 
most demanding and generally most appropriate procedure-that is. the 
alternate or equivalent form method— when used with a time interval 
between teet end tetes, yields the 

impressed by the sheer elevation of the coefficient. The value of the reliability 
coefficient should be considered, but remember that the methods used are 
reflected in the coefficient and warrant your attention. 



116 


TESTING THE TKT 


Heisht of Reliability Coefficient 

A reliability coefficient should be as high as possible. Unfortunately 
perfection is not now possible, so we must settle for the best we can get. 

The degree of reliability should be determined by the purposes and 
situations for which we intend to use the measurement instrument. The 
school psychologist who must decide on the possibility of placing a child 
in a mentally retarded class or state institution needs the most reliable 
instruments available. The counselor attempting to ascertain parental 
attitudes toward educational policy is of course not as concerned with 
reliability, because only the average figures need to be highly accurate, 
not the individual parental responses. 

If an instrument with low reliability is the “best” or only device and you 
need to use it, be careful in making evaluations. Obtain all types of data and 
use the test results wth this information on a tentative basis. As we stated 
in our discussion of validity coefficients, even a poor but significant coefficient 
is better than nothing. The basic principle to keep m mind is that the im- 
portance of the decision is equal to the need for precision in measurement. 
The greater our need for confidence in the stability and consistency of the 
test, the more we need higher reliability (Wcsman, 1952). 

What to Look for in the Test Manual 

The following guidelines will in most cases be familiar to you from our 
recent discussion of reliability. They are intended only as a quick checklist 
to help you in evaluating a test’s reliability. They represent only some of 
the most important features and are based largely on the recommendations in 
Standards for Educational and Psychological Tests and Manuals (American 
Psychological .Association ct al., 1966). The reader is referred to Standards 
for a complete and detailed description. 

1. Reliability evidence should be reported to the extent that you 
may judge sybether scores are dependable for the recommended 
purposes of the test. If any important data have not been obtained, 
this should be mentioned. 

2. Every score, subscorc, or combination of scores should be 
judged by the standards for reliability. 

3. Reports on reliability or error of measurement should be given 
in enough detail to permit you to judge if the data are applicable to 
the tj'pes of persons you desire to examine. For example, is there 
evidence that indicates that reliability was obtained, in a mechanical 
comprehension test, on girls as well as boys? 

^ 4. I he reliability analysis for an intelligence or achievement test 
intended to be used to make differentiations within one school grade 
should be based on pupils only within the actual grade. It should not 
be based on many grades with a broader range of ability. 



VALroiTY AND RELIABILITY jjy 

5. The test manual should state if there are significant changes in 
the error of measurement from score level to score level. 

6. Test authors and publishers should report reliability ini'estiga- 
tions in standard statistical terms (for example, standard error of 
measurement, rcliabihty coefficients, and so on). It is their job to 
communicate with you. Do not be awed by unconventional statistics, 
ff the statistical usage is unusual the test author should present a 
complete explanation of why and what these statistics mean. 

7. Reliability is very important hut it is not a replacement for 
validity. Reliability does not demonstrate validity; it can only 
support it. 

S. If two forms of a test are used, both of ivhich are intended for 
the same subjects, averages and spread of scores as «'ell as the co- 
efficient of correlation between the tests should be given. 

9. Sometimes measures of internal consistency are most appro- 
priate; however, they should not be thought of as substitutes for 
other measures. If alternate forms are available, they should be used 
and alternate form reliabilities should be reported as the preferred 
technique. This does not mean coefficients resting on internal 
analj’sis should be omitted. It only means that alternate forms have 
first preference. 

10. In most cases estimates of interna) consistency should be 
based on the “split«half” or Kuder-Richardson technique. Any 
deviation from this should be dearly explained in the test manual. 

11. Careful attention to a review of reliability coefficients based on 
internal anal^-sis, especially on time factors, is important. If speed 
is a factor the coefficients i«ll be e.xceptionaJIy high and tend to be 
insignificant. 

12. The test manual should indicate to what degree test scores are 
likely to change after a given amount of time has elapsed. The mean 
and standard deviation of scores and correlation at each testing should 
also be reported. 


Practical Concerns 

Until now our attention has been focused on the technical and theoretical 
aspects of testing. These, of course, are of primary importance in selecting 
teL; however, practical considerations cannot be overtoked. FmMO'al 
aspects of testing and test time are necessary concerns of the school ad- 
ministrator and his staff. In addition, ease of admimstrattott and scoring are 
important Actors, because teachers generally have a minimum amount of 
e.xperience and training in testing. 



118 


TESTING THE TEST 


Economic Aspects 

Money is a very important consideration when formulating educational 
policy'. Testing must take its turn in the line of educational needs awaiting 
financing. Fortunately, testing is relatively inexpensive, especially when 
compared to other educational needs. In addition, federal and state allow- 
ances almost guarantee every school district in the United States enough 
funds to maintain an adequate standardized testing program. 

Because funds are not unlimited, it is desirable to save when possible. 
One of the first places where it is possible to save is in the reuse of test 
booklets that have separate answer sheets. Thus, the only yearly expenses 
are answer sheets and occasional replacements of worn-out booklets. Test 
booklets with separate answer sheets, however, should not be used in the 
primary or lower elementary grades. The end of the fifth grade or beginning 
of the sixth is probably early enough to begin using separate answer sheets. 
However, there may be situations where the children’s sociopsychological, 
intellectual, and motor skills are very well developed, in which case an earlier 
grade would be appropriate. On the other hand, in some settings junior high 
school would be early enough. 

Aiminisiratise Aspects 

Tests are generally administered by teachers or other educational personnel 
with limited measurement training. The ease of administering a test is 
facilitated by simple and clear direaions. A test with a great many subtests 
which require exact (stopwatch) timing and new directions for each section 
is an exacting job. This may produce a situation for possible errors in direc- 
tions and timing which could affect the final results. Validity and reliability of 
the test scores W’ould then be of questionable value. 

Time Aspects 

Saving time in test administration should be approached with extreme 
caution. In our discussion of reliability it was slated that the reliability of a 
test is dependent on the length of the test. Thus, shortening the test time is 
generally accomplbhed at the expense of test reliability. This is particularly 
true of some “quickie” tests on the market that claim to produce a reliable 
IQ score in fifteen or twenty minutes. Some tests are efficiently constructed, 
but in roost cases reductions in testing time means loss of reliability. 

Scoring Aspects 

are many teachers «ho have wewed testing with horror because of 
tediovu hours spent hand scoring. To make matters worse, many times the 
directions for scoring required a test specialist to interpret. Today, by the 
use of separate answer sheets and machine scoring, this problem has at least 



VALIDITY AND RELIABILITY 

been 'ed-iced for those teachers who teach upper elementary grades and 
beyond. In addition, most contemporary test manuals go to great lengths to 
prreent scoring procedures in simple and easy to understand terms.* 

I ests for children m the primary grades (K.3) must of necessity involie 
more time in scoring because young children may find separate answer sheets 
confusing. By the middle elementary grades (about the middle or end of the 
fourth grade) there are techniques, such as answer spaces at the right side of 
the page which can be scored with an answer key, that lessen the scoring 
burden. (Review Chapter 3, section on scoring for more details.) There is 
every reason to choose a test that is easily scored over one that it is difficult 
to score if this does not sacrifice validity or reliability. 


Interpretive Aspects 

There is no point in an elaborate testing program if the results are not 
meaningful in educational planning. Test results that are hard to understand 
or easy to misinterpretarenotonlya wasteof time but in some cases harmful 
to the children we are attempting to help. 

The manual should present cogent and dear statements concerning the 
meanings of scores. Do nof administer any test, even if it has all the positive 
features we have discussed, until you are sure you know what to do with the 
results. Tests are constructed to tell us something. If they do rot do this, 
they are a meaningless exercise in the consumption of time. 


References 

American Psychological Assodatlon. Technical reeommendathni for psychological 
tests and diagnosite iechni<pies. Washington, D.C.: American Psychological 
Association, 1954. 

American Psychological Association, American Educational Research Association, 
and National Council on hleasuremcnt in Education. Standards for educational 
and psychological tests and manuals. IVashiogton, D.C.; American Psj’chological 
Association, 1966. 

Anastas), A. Some current developments in the measurement and interpretation 
of test validity. In A. Anastasi (^.), Testing prohlems in pmpeefite. Washington, 
D,C.: American Council on Education, 1966. Pp. 307-17. 

Anastasi, A. Psychological testing. (3rd ed.) New York: Macmillan. 1968. 

Committee on Test Standards, American Educational Research Association; 
National Education Association; and National Council on Measurements Used 
in Education. Technical recommendations for aehiefemenl tests. Washington, D.C.: 
The National Education Association, 1955. , 

Cronbach. L. J. Essentials of psychological testing. (2nd cd.) New lorfc; Harper i 

Cr^bxl^L. I; and Mrehl. P. E. Coi^miM «Mty in psycholosical i«s. In 
C I. Chase and K. G. Ludlow (Eds,), Headt/tfff w edueational and psychological 
measurement. Boston; Houghton Mifflin. 1966. IV. 68-92. 



120 


TESTING THE TEST 


Doppelt, J. E. How accurate is a test score? Test Service Bulletin, No. 50. New 
York: The Psychological Corporation, 1956. 

Games, P. A., and Klare, G. R. Elementary statistics : Data analysis for the be- 
havioral sciences. New York: McGraw-Hill, 1967. 

Ruder, G. F., and Richardson, M. W. The thcor>' of the estimation of test 
reliability. Psychometrika, 1937, 2, 151-60. 

Lindeman, R. H. Educational measureynent. Glenview, 111.: Scott, Foresman, 1967. 
Lord, F. M., and Novick, M. R. Statistical theories of mental test scores. Reading, 
Mass. : Addison-Wesley, 1968. 

Magnusson, D. Test theory. Reading, Mass.: Addison-Wesley, 1967. 

McCollough, C., and Van Atta, L. Introduction to descriptive statistics and 
correlation: A program far self-instruction. New York: McGraw-Hill, 1965. 

Noll, V. H. Introduction to educational measurement, (2nd ed.) Boston: Houghton 
Mifflin, 1965. 

Nurmally, J. C. Psychometric theory. New York: McGraw-Hill, 1967. 

Popham, W. J. Educational statistics: Use and interpretation. New York; Harper & 
Row, 1967. 

Thorndike, R. L., and Hagen, E. Measurement and evaluation in psychology and 
education. (2nd ed.) New York: Wiley, 1961. 

Wesman, A. G. Expectancy tables — a way of interpreting test validity. Test Service 
Bulletin, No. 38. New York: The Psychological Corporation, 1949. 

Wesman, A, G. Reliability and confidence. Test Service Bulletin, No. 44. New 
York: The Psychological Corporation, 1952. 

Wesman, A. G. Double-entry expectancy tables. Test Service Bulletin, No. 56. 
New York: The Psychological Corporation, 1966. 


CHAPTER 


Statistics, Norms, 
and Standard Scores 



Some Statistics 

This chapter is not intended for statisticians or mathematically oriented 
teachers (though thcj' may read it too). It is geared to convey in simple 
terminology enough statistical insights to enable the reader to understand 
testing better. Actually, )’ou already know some of the terms, such as standard 
deviation and correlation coefficient, from our previous discussions. (If the 
reader has been exposed io hasio staiistics either in course work or in other 
readings he could skip Statistics and go on to Norms and Standard Scores.) 

Educators and psychologists talk a great deal about individual differences. 
Through measurement techniques as well as other devices there is a continual 
search to find out how humans distribute themselves on some measured 
characteristic or group of characteristics. These investigations form the basis 
for the inferences made about individual differences. 


Rationale for Statistics 

At this point Jet us pause and examine feelings that many of you may have 
toward statistics. Some of you may be sajing, "What good are they? I 

m 


122 


testing the test 


won’t be using them anytvay.” If you have encountered difficulty tvith 
mathematics, you are apt to be anxious. On the other hand, those of you 
who have experienced little difficulty or have found mathematics fun and 
excitbg may be eager to begin. It is ncccssar>', however, that all of you, no 
matter what your mathematical ability, learn simple shorthand techniques 
in elementary statistics. These arc the tools of the trade. You will need to know 
simple statistical procedures in order to give meaning to your standardized 
testing programs; in order to evaluate properly standardized tests of scholastic 
aptitude, achievement, and so on; and in order to read intelligently profes- 
sional books and journals. 

If you are one of those who finds numbers confusing, you may take comfort 
from the following: 


1. It is said that Charles Darwin frankly admitted trouble with 
statistics. 

2. Sir Frands Galton, who was instrumental in introducing statis- 
tics into psychology, at times had to ask others to help or to do some 
of his mathematical problems. 

3. The advent of the computer enables the student to concentrate 
on the use rather than on the mechanics of statistics. Thus, those of 
you who are mathematically unsophisticated need not be greatly 
concerned. The important thing today is understanding when to use 
a certain technique or method. 

4. The advanced statistics that require mathematical insight and 
sWU need not concern the teacher and school counselor. 

5. Elementary statistics requires no more than an average seventh- 
grade understanding of arithmetic. 


Language 

A number of symbols and shorthand devices have been dev'eloped in order 
to enable us to describe the characteristics of groups and indiwduals in 
comparison with other groups. These symbols are similar to the symbols 
you are now putting together in order to read this page. That is, I have 
translated my neurological impulses into learned tcord symbols %vhich we call 
thoughts. These thoughts I have further translated into learned letter sjTnbols 
which are being put together to form homed tcord or thought symbols. These 
symbols you recognize as words on paper but more importantly they com- 
municate ideas or “thoughts” to you. 

In communicating the information necessary to understand tests and test 
reults 'we use a different language, the language of statistics. Thus \vith a 
single number, letter, or symbol an idea may be conveyed which would require 
a paragraph, or e\’en a page, of verbal discourse. 

The obje^ve of statistics in testing is to describe a score, a set of scores, 
and the ^'arious relationships between them. A score on any measure or test 



STATISTICS, NORMS, AND STANDARD SCORES 123 

has little or no meaning in and of itself. Meaning is given to a score by its 
relationship to other scores in a given measure. Let us consider for a moment 
what information you need on a score or set of scores: 

1. The range of scores. 

2. The distribution. 

3. The frequency with which the same score was obtained. 

4. The average score. 

Why do we need this information? Obviously, if you give a test to your 
students with 100 items and no one gets more than ten correct, this is not a 
good test for your class. A test, classroom or standardized, must distinguish 
bet\veen those students who have mastered the subject or possess a given 
characteristic and those who have not mastered the material or possess the 
characteristic. Carried further, the test should tell us the relative degree to 
which the student has these skills or abilities. 

Even if we have a wide range of scores on our 100-item test, but most of 
the students receive the same score, this test is not distinguishing between 
students. 

All this information can be conveyed quickly and simultaneously by using 
numbers arranged in certain ways. 


Ranking Scores 

One way to see quickly the range of scores is to arrange them in order from 
highest to lowest. Let us for a moment assume that we are all teachers of 
English. One of us has recently administered a vocabulary test to ten ninth- 
grade Students. At this time we are primarily interested in scores (not in total 
number of questions on the test). Table 5 presents the scores from highest 
to lowest. A simple inspection of the scores in Tabic 5 reveals that they range 
from 3 to 8, whereas all other scores are in order of magnitude between these 
extremes. 

Table 5 Scores on Vocabulary Test for Ninth-Grade Students 

8 

7 

7 

6 

6 

6 

5 

5 

4 

3 



124 


TESTING THE TEST 


Frequency Distribution 

We can simplify our picture of these scores by putting all scores that are 
the same together. But to account for the number of students having the same 
score, we need a new’ column of figures which will tell us how' many students 
received that score. 

The column for scores we head X; the column that represents the number 
of people obtaining that score we head /. N is used to designate the number 
of subjects tested. 

Using our small group for illustrative purposes, let us look at our English 
scores through shorthand. (Obviously the small number cited does not 
require shorthand; however, when the numbers are increased a hundredfold 
or more, the need becomes more apparent.) Table 6 show-s that three students 
obtained the score of 6; two scored 5 and 7, respectively; whereas scores of 
8, 4, and 3 were obtained by three different students. 

Table 6 Frequency DIstributionof Scores on a Vocabulary Test for Ninth-Graders 

X / 

8 I 

7 2 

6 3 

5 2 

4 1 

3 1 


The numbers and letters are arranged to form a frequency distribution of 
scorra, Frequeisey means number of occurrences. Distribution means the tcay 
m tvhtch something occurs. When wc put these ttvo words together we have 
frequency distnbuhon. Our frequency distribution shows the number of raw 
scores and the number of students obuining a given score. Our shorthand 
system allows us to use X to represent raw score. Raw score means simply 

frlTJr^'s questions right. The letter/ represents the number or 

jrequency of students obtaining a given score. 


Class Interval 

sc^ymul^cr'''!^ of u set of scores by grouping 

iame nuX ^fsen "hen there U a 

listed in Table 7 example, you have been given a set of IQ scores 

infXtion X “ ■" 'his manner do not convey much useful 

tnformation. Concerning such a set of dau we would generally want to know 



125 


STATISTICS, NORMS, AND STANDARD SCORES 
TaWe 7 Scores on an IQ Test 


125 133 I3S 137 
134 129 144 m 
133 142 115 136 
US 127 127 133 
146 121 119 126 
139 12S 147 m 
114 116 


155 127 140 133 
122 151 129 121 
141 120 125 138 
146 110 116 134 
119 117 124 221 
127 128 129 132 


several thmgs — the average fQ score, the amount of variability in this group, 
and the distribution of scores. In order to answer these questions «e must 
set up a frequency distribution. This time we need to huncii up our scores in 
order to save time. To do this we must establish a clast inierval. 

■ The term class interval may be defined as an arbitrary tool for arranging 
data in groups. Our data on IQ tests should be arranged so that they are 
easier to handle. One method of arrangement is the class interval system. 
Each possible score must be accounted for tvithm the range from highest to 
lowest. 

The first thing to be decided in constructing the class intervals for (his 
data )s the size of the class interval or group. The size of the class interval is 
determined by the rule that there should be not less than ten nor more than 
twenty class intervals. The usual grouping is fifteen. In dealing with small 
numbers of scores fewer class intervals are favored because of convenience. 
In grouping data certain minor errors are introduced into the calculations. 
The cruder the grouping— that is, the smaller number of groups— the greater 
the chance for error. In determining the sire of the class interval w e arc guided 
by the need to reduce our data to the number of groups chosen. 

Let US now look at our scores in Table 7 ; 155 is the highest and 1 10 is the 
lowest. Thus, our range is from 1 10 to 155. Because we have a small number 
of scores, we will arrange them in ten groups. Now our question is how many 
scores will be In each group. In determining this we obtain the range of 
scores; add 1 to the highest score and subtract the lowest score (153 f 1 — 
no = 46). We then dinde I0into46, which is 4.6, round Jt off to ibe nearest 
whole number, and our class imeival is 5. If we had chosen fifteen groups, 
our computation would have yielded 3.06 and our class inicn-al would have 
been 3. You should remember that the basic purpose of grouping scores is 
to make a convenient representation. 

In summary, the following steps were invoh-ed: 


1. Highest score plus 1 minus lowest score <!55 + ■ - 

2. Divide the range by number of groups desired (46/10 _ 4.6). 

3. Round off to nearest whole number (4,6 == 5). 



126 


TESTING THE TEST 


Table 8 Frequency Distribution of IQ Scores 
with Scores Grouped by Class Intervals 


Scores 

Tally Marks 

Frequencies (/) 

155-159 

1 


1 

150-154 

1 


1 

145-149 

111 


3 

140-144 

nil 


4 

135-139 

mL 1 


6 

130-134 

“mi, n 


7 

125-129 


U 

12 

120-124 

-mL 1 


6 

115-119 

'mjL 111 


8 

110-114 

u 


2 


r/ = 50 = JV 


Tabic 8 shows our data grouped in class intervals of 5. In order to obtain 
the frequencies we have used a system of tallies. Each mark in the column 
marked “tally marks” represents one individual having a score in that five- 
point range. For instance, in the range 140-144 we see that four scores fell 
in this range. We do not know the exact score for any of these individuals, 
that U, whether a score was 140, 141, 142, 143, or l-H. We have to assume 
that they were evenly distributed. 


Graphic Representation 

Wc may also show a set of scores by “drawing a picture" of them. Let us 
again refer to the data in Table 8. An inspection of the distribution shows us 
that the most frequent scores occur in the interval 125-129, and that the very 
low and very high scores arc less numerous. The greatest cluster of scores is in 
the lower half of the range. 'The following are two different representations 
of the same data in pictorial form: 

In Figures 6 and 7, the frequency of the IQ scores can be more readily 
viewed than in Table 8. The histogram (Figure 6) is sometimes referred to as 
“piling up the bodies," because each square represents an individual who 
obtained that score. When more than one score falls in a given class interval, 
it is represented by making that pile another square higher. The score inter\'als 
can be seen along the abscissa (horizontal base-line). The ordinate (vertical 
height) represents the frequencies. Thus, wc read from the histogram that 
so.cn individuals scored in the range 130-134 and so forth. 

The same data is pictured in Figure 7 in the form of a frequency polygon. 




Secret 

Fi^re 6. Histogram of fifty /Q scores. 


The midpoint of each of the score intervals has been plotted. The height of 
the point equals the frequency in the intcn'al. These points have been 
connected to sho^v graphically the distribution of scores. The histogram and 
frequency polygon are generally similar de\'ices to illustrate the same facts. 
There are, however, advantages and disadvantages to both. On the whole, 
the frequency polygon is generally preferred to the histogram because it 
gives a better showing of the shape of the distribution. The student who is 
interested in pursuing this area in more depth should consult one of the 
many statistics tests which offer a more definitive treatment of the subject. 
(See, for example, Guilford, 1965; Gourevitch, 1965; Chase, 1967.) 


b« of Cowl 


128 


testing the test 



Measures of Central Tendency 

■ P'^rc of infornmion we need concerning a set of scores 

n the arcrage. A measure of central tendenej- is just that: the average of a 
set of scores. 

The teacher ssho is planning materials for her seventh-grade class needs to 
Vnois sennet., mg almut the general abilities and achievements of her students, 
11. lor e tamp e she IS going to order a set of reading boohs, she needs to knots' 
■ ’ r' ■" nttlrr to purchase the books appro- 

P 1 .e siiih their level of skills. <\serages give us clues to the characteristics 


129 


STATJSUCS, NORMS, AND STANDARD SCORES 

awal™*' of *= 

Chase (1967) cites three principal uses of averages: 

(1) They indicate the amount of a given condition which is typical 
for a defined group of individuals; 

(2) They pronde a basis for comparing a condition in one group 
with the same condition in a second group ; and 

(3) They al/ow us to estimate a typical condition for many indi- 
viduals when we have measurements on only a portion of the total 
number of those individuals 27). 


Many people think of an "average" as the sole product of an arithmetic 
process, but there are different types of averages that are useful in portraying 
the typical of a given group. Each of these is calculated by a different pro- 
cedure. These averages are the mode, the median, and the mean. Thus these 
three measures that indicate a typical condition or the average performance 
of a given group are ^vhat we mean by measures of central tendency. They are 
called measures of central tendency rather than averages because the word 
average is a general term. And because there are different types of statistical 
averages, it is a meaningless word without proper reference to the kind of 
average being described. For example, in describing a group’s reading level 
at the national median for seventh-grade students not only are you signifying 
that they arc typical or average in performance, but you are stating how this 
average was computed. You are stating precisely your statistical frame of 
reference. Let us look at these three measures of central tendency in terms 
of what they mean and how thay may be derived. 


. Mean (X) 

The mean^ is one of the merer xvideJy used measures of central tenden^; 
it is the arithmetic average that you have been using for years in computing 
your grade averages in school. For example, let us suppose that you have been 
asked by the Dean of Pedagogy to submit your overall undergraduate grade 
average for evaluation. This is a rather simple procedure for you. You sit 
down and count the number of A’s, B’s, C's, D's, and F's and their hours of 
credit. Let us suppose that at your college an A is equal to 4 points; B equal 
to 3 points; C equal to 2 points; D equal to 1 point; and F equal to zero. 
Let us also suppose that all your 120 hours were made up of three-hour 

courses. Thus you would have taken foriythree-hourcourses(120 ^3 =40). 


f n should be noted that hvo other types of measures of ^tral tendcncor are a^ 
used These are the geomefne mean and the barmome mean. These are rarely used by 
SerV ^idance counselors, or psychologists and need not concern the beginnmg 


student. 



testing the test 
130 

Now let US use the frequency distribution to help coinpute your grade 
average. Rentember X = raw score and/ = frequency of these scores Upon 
inspection of your transcript you find that you earned the following grades. 
(1) ten A’s; (2) twenty B’s; (3) five Cs; (4) two D’s; and (5) three F s. 


Grades and Their Values in Tabular Form 


X 

/ 

fx 

4 

10 

40 

3 

20 

60 

2 

5 

10 

1 

2 

2 

0 

3 

0 

N 

= 40 

112 


rjx 


Here we have introduced some new symbols. You remember that AT equals 
number; in this case N = number of courses. In other situations it niay 
equal number of scores, number of persons, and so forth. Our new sjTTibol is 
S.S equals summation (“adding up” or “sum up”). In your situation, then, 
N equals the number of courses you have taken and S cqu®^ 
frequency of these grades multiplied by the weighted grade. Thus, fX is 
frequency times grade weight or scores. A is equal to four points. You received 
ten A’s. Ten times the weighted grade of A equals 40. Let us continue to 
figure out your grade average. We know that you had forty courses (A^ =* 
and the sum of these times their frequency (2 /Y) after weighting equals 112- 
You know that in order to find the average you must now divide the total 
or sum of the weighted scores by the number of courses. Your arithmetical 
procedure is the following: 


2.80 

40 1 112.00 
80 

320 

320 

00 

Thus, your undergraduate grade average is 2.80, or a C-f-. Those D’s and F’s 
hurt your grade average and worked against the effect of ten A’s and ^venty 
B s. On the other hand, the A’s and B’s helped you in making up for the three 
failures and two near failures. 

The following frequency distribution allows us to look at your grade average 
through the sjTnbols of statbtics: 



131 


STATISTICS, NORMS, AND STANDARD SCORES 

Frequency Distribution of College Grades and 
Steps in Calculating Grade Point Average 

A- / /A* 


4 10 40 

3 20 60 

2 5 10 

1 2 2 

0 3 0 


112 

EfX 


m „,x = 2.8( 


Y„u have ju. 

viean or arithmtUc average. 1 he mean gruuc 

therefore 2 . 80 . information in a distribution than the 

The mean uses more of oomputing the others m that 
mode and median. Less '"f”™ ,1,^ provides a more sensitive 

the mean uses a 1 •>'= '“^“'.Ihere arrlituations, however, when the use of 
index of central tendency. example you are attempting to find 

every score can be a ‘■‘“‘*”''"£'1 j|s,rio,. There are 1 0,000 wage earners. 

out the average income in your 512,000 range. There are, 

most of whom earn sa art sloO.OOO or more, and one person is 

however, ten people "'*'“1® ' .... „ The arithmetic average would not. 


Aftdwn (Mdn.) „ „f , few high incomes was shown to 

In out previous “'“'“‘‘.""^^“The median would not be so affected 

distort the '■scfulness of the mean.^^*j 

because when computing it wo have an equal place with an income 

?hat is, an income of SI ”„fft^rocies 2p or down in calcula.tng 

can give a distorted picture 



132 


testing the test 


Table 9 Average Incomes of Nine Wage Earners Using the Mean and Median 


, , , , 1 ■ ' " . 

Wage 

Annual 

Median Procedure 

Mean Procedure 

Earner 

Income 


Jones 

sioo.ooo 

(1) Counting dow n 

18,000 

9.000 

five cases 

9 1,162,000 

9 

Smith 

(2) 

Leslie 

8.500 

(3) 


Stem 

8,300 

(4) 

72 

Foster 

8,000 

(S) medbn 

72 

Ste\'ens 

7,500 

(•1) 

b 

Shoemaker 

7,200 

(3) 

Marty 

7,000 

(2) Counting up 

Mean =$18,000 


five cases 

Nelson 

6,500 

(1) 

Median = 8,000 

iV =9 

$162,000 


.. 


how ]ones’ income changes the mean but has little influence on the median. 

In computing the median we count up or down to exactly half of the scores 
or in this case the wage earners. Thus with nine salaries represented the 
median would be the fifth, which is $8,000. The mean income for this group 
is $18,000. We see that the mean is $10,000 more than the median. Both the 
median and the mean are “averages,” but for this data the median is a more 
accurate representation of the typical wage of the group. 

Theoretically, in a normal distribution cur\ e the mean, median, and mode 
will all fall on the same point. We shall discuss this further in our discussion 
of the normal curve. If you have a large discrepancy between any of these 
three measures of central tendency for a group of data, a visual inspection is 
called for to determine what is distorting the figures. 


Mode (Mo.) 

The mode is a \ery crude statistic and generally not verv useful. However, 
it is easy to find and gives a quick and rough measure of central tendency. 
For example, you have fifty English test scores and you want to know five 
minutes before a department meeting what the typical score was. You might 
look at the score that occurs most frequently and have a rough approximation; 
this most frequent score would be the mode. It is of course ver>’ possible 
that the mode may not indirate anything at all and be far removed from the 
average score. Therefore, it is recommended that you ne\'er use it except in 
the most pressing cases and use it then with the utmost reservations. 



133 


STATISTICS, NORMS, AND STANDARD SCORES 

Measures of Variability 

In describing a set of scores, in addition to central tendency it is desirable 
to have some data on how scores vary from the measure of central tendency. 
The most commonly used index of \'ariability is called the standard deviation. 

Standard Deviation 

Generally, the term standard deviation is used to signify variability from 
•the mean. Popbam (1967) describes it quite «-ell when he says, “Actually, 
the standard deviation is sometvhat analogous to the mean. While the mean 
is an average of the scores in a set, the standard deviation is a sort of average 
of how distant the individual scores in a distribution are remo%’ed from the 
mean itself” (p. 16). Put another way, it is a statistic that portrays a given 
distribution of scores. A large standard deviation indicates a wide spread 
of scores whereas a small standard deviation reveals less score spread. 

If we turn our attention to a particular type of distribution, which is called 
the ‘‘normal” curve, we may perceive standard deviation more clearly. A 
normal distribution is a symmetrical, bell-shaped curve, which represents a 
theoretical distribution of scores. 

Figure 8 represents a normal curve in which you will note that most of 
the space is in the center with an equally decreasing amount as we go from 
the midline. Note that a line is drawn bisecting the curve into two equal 
portions. This line represents the mean (as well, of course, as the median 
and mode). 

In a normal curve the area on both sides of the mean in which approx- 
imately 68 per cent of the scores fall is designated as plus and minus one 
standard deviation from the mean. 

Figure 9 reveals the normal curve with the standard deviations and the 
percentages of cases in each deviation. The whole area of the cun’c represents 
the total universe or total number of scores in the distribution. Vertical lines 
have been drawn to the base line at the mean and at intervals designated as 




134 


testing the test 



•4-1 to -r4 on the right and — *1 to —4 on the left. The areas bet^veen these 
lines ojntain the percentages of cases or people to be found under that section 
of the cur\'e. In the theoretical normal curve there is an exact relationship 
(mathenwtical) bet\i'een the standard deviation and the ratio or proportion 
of cases. In an IQ test with a normal distribution which has a mean of IW 
and a standard de>*iation of 10, 34.13 per cent of the scores would fall some- 
where between the mean and 4-1 standard de\*iation, or between 100 and UO 
IQ points. In the same way 34.13 per cent of the scores would fall somewhere 
between the mean and —1 standard deviation, or between 90 and 100 IQ 
points. 

It can thus be seen that 68.26 per cent of cases shown on a "normal curv’C 
occur somewhere bet^veen -f 1 and —1 standard deviations from the mean. 
This means approximately two thirds of all scores in a normal distribution 
lie between il standard de^’iation from the mean. Similarly, a little over 
95 per cent (95.44) of scores will fall between -i- Zand— 2stand^d de\'iations 
from the mean, and almost all the scores (99.9 per cent) will be somewhere 
benveen -f 3 and —3 standard deviations from the mean. Standard de^dation 
units and percentile equivalents may be seen in Table 10. 


Table 10 Standard Deviations and Pcrcenulc 
Equivalents* 


Standard Deviation 

Percentile Equivalents 

4-3 

99 

4-2 

98 

4-1 

84 

0(X) 

50 

—1 

16 

—2 

2 

-3 

0.1 


Note that percentiles are rounded. 


STATISTICS, NORMS, AND STANDARD SCORES I3S 

nnJ™l “f 'fWth mil always be found in a 

normal curr-e between ±3 sundard devfations from the mean enables us to 
use It as a gauge to compare groups and individuals. Most distributions of 
course, are not "normal” and the eaact relationships will not always hold. 
It IS true, however, that many distributions are close enough to the “normal 
cun-e that we may make certain assumptions; the standard deviation in 
these situations means almost the same thing as it would in the "normal 
curve. 


If you have a student who is +1 sundard deviation from the mean on a 
test {if, of course, the group at least approximates the "normal” distribution), 
you can state that he is at the 84th percentile compared to the group of 
people that the test producer used in determining the distribution. Put 
another way you can say that roughly he surpasses 84 per cent of the group 
to whom he is being compared (50 per cent below the mean and 34 per cent 
bet%veen the mean and + 1 standard deviation). 

In terms of scores, the standard demtion becomes greater as the scores 
are more widely spread. That is, the larger the standard deviation, the 
wider the spread of scores. Thus, we have some idea of the variability of a 
set of scores by the size of the standard dc%iation. Later in this chapter we 
shall discuss standard scores in relationship to the normal cun'e. 


Correiathn 

In our discussion of validity and reliability (see Chapter 4) we have already 
encountered this area of statistics. Correbtion is a measure of relationship. 
What is the relationship, for example, between high school grades and college 
success? Or what is the relationship between parental education and children’s 
success in school? These and ocher questions may be ansnered (at least in 
part) by the use of a statistic called the correlation cocfRcient (r). We stated 
that a correlation of +1 signifies a perfect relationship whereas a —1 in* 
dicates the opposite. A zero correlation coefficient means there is no relation- 
ship. Before you read further go back to Chapter 4 and review correlation. 

Table 1 1 presents two sets of scores, one set on test X and one on test Y. 
These scores were made by the same students, each student having one 
score on both test X and test Y. If we inspect Table U we note that the 
student who made the highest score on test X made the highest score on 
Test Y. The student who made the second highest score on test X made the 
second highest score on test Y, and the exact correlation continues throughout 
the scores. This illustrates a perfect (+1) correlation. 

Figure 10 reveals the scores of Table 13 m a scatter diagram format. 
Note the class intervals that we talked about before are now used in this 
diagram. The scores have been grouped in class inten-als of two. Each 
column in the diagram intersects with eveiy row. This creates a ccH at each 
intersection. Every cell signifies s anrqrre ^bmation of one of the scores 
of test X with one of the scores of test Y. For example. Maty has a score of 



136 


TESTING THE TEST 


Table 11 Scores of Ten Students 
on Two Tests 


Students 

Test X 

Test Y 

James 

12 

14 

Mary 

13 

15 

David 

16 

18 

Joe 

17 

19 

Kim 

18 

20 

Cathy 

20 

22 

Lou 

21 

23 

Ljmn 

24 

26 

Sue 

26 

28 

Beth 

27 

29 


13 on test X and a score of 15 on test Y. Look at Table 1 1 and locate at the 
top the class interval containing Mary’s X test score. This is in the second 
column from the left. Move do\vn this column until you locate, on the left 
margin, the inteiv’al containing Mary’s Y test score, which is in the third 
row from the bottom. Thus, our tally is placed in that cell which has been 
made by the intersection of the row and column. 

Note that all the tallies fall along a straight line. This form of relationship 
is an indication of a linear correlation. There is a direct, and high relationship 
between the two sets of scores. 

The teacher and counselor should remember that correlation is a measure 
of mutual relation. Our basic question is, ‘‘\Miat is the relationship beuveen 
two or more \'ari3bles? As you know, correlation coefficients tell us many 
things. Of special importance to the student of measurement is its use with 
reliability, validity, and prediction. It is a basic tool in estimating these very 
important factors. 

Norms 

Mrs. Stone was so proud of her son Robert because he had an IQ of 100. 
Mrs. Stone thought, of course, that 100 was a perfect score. She did not 
know that an IQ of 100 was average. As future teachers or counselors you 
should remember that no score is meaningful unless you know on what 
basis it was derived. In one test, for example, a score of 200 would really 
mwn that the student did not answer even one question correctly. Alv\'a>’S 
ask, theri, for the frame of reference of the child’s test scores. A number is 
meaningless without it. Even a simple raw score which indicates the number 
of Items answered correctly is not meaningful unless you know how many 
qu«tions were asked and what the factors are that indicate a good showing- 
In order to convej’ meaningful information about test performance and 



137 


STATISTICS, NORMS, AND STANDARD SCORES 


Scores on Test X 


■ 

g 

m 



g 


g 

g 

g 



m 



1 




B 



■ 



1 







■ 



i 







i 



I 







■ 



g 










I 





m 


i 


■ 

g 







n 


■ 

g 





g 


■ 


■ 

■ 







1 






— 

U 


Figure 10. Saner diagram from Table 11. 


B/irmi These norms are reference 

ioores we translate the jifferent factors. For Kample a 

points that compare raw scor a, the first-grade let el 

Lore of 10 on a given test may i„diate retardat.on. 

whereas the same score „„i%hen we refer it to the appropr|ate 

Thus, a raw shall “S' ^ 

norms. Some of the ^ „„n„s, 

percentile norms, and stanoaru 


Age Norms 

Educational Aoe educational age as a norm. The 


testing the test 

138 

from differerrt schools and geographic locations and ages. Average scores are 
obtained for each age level. The raw score of => 
compared to these norms and an educational age (EA) If. 

ample a twelve-year-old achieves an educational age score of 12, he wouio 
be considered perfectly average. If he obtained an educational age score ol 
10, we would interpret this as meaning he is behind his age group, inese 
no’rms are most appropriate at the elementary level; however, in the area 
of achievement, there is serious question of their worth. Children mature at 
different rates and the equality of educational age norms is open to question. 
More importantly, there is no way of comparing educational ages, because 
all achievement tests are based upon unequal and different units of measure- 
ments. 


Mental Age 

The mental age (MA) norm is provided by some test publishers for the 
interpretation of intelligence test scores. A child is compared with the norrn 
of his age group. In this way we may know whether he is more “intelligent 
or less “intelligent” than the average child of his age. The eight-year-old 
child who performs as well as the average ten-year-old is said to have a 
mental age of 10. 

This procedure, because of its difficulties in construction as well as other 
factors, is being slowly phased out of testing. It is very difficult, for example, 
to get a representative group of children of a given age. They are many times 
located in different grades. The equality, as in educational age norms, of 
mental age units is doubtful. This is especially true as an individual goes 
from adolescence to adulthood and age ceases to have very much meaning. 
If mental age is to be used, it is most appropriate in interpreting general 
intelligence. We will have more to say on mental age in our discussion of 
IQ quotients. It should, however, be noted that Binet introduced the con- 
cept of mental age and in the latest revision of the Stanford-Binet standard 
scores replace age scales (Terman and Merrill, 1960). 


Grade Norms 

The grade norm (also called grade-placement norms) is very similar to 
age norms. It is obtained by finding the average scores for students at 
different grade levels. The same process of finding representative groups 
of pupils is used as in finding educational norms. 

The chief advantage of grade norms over age norms is that comparisons 
are made among children who have had the same amount of educational 
exposure. The standard method of expressing grade norms is by assigning 
a number to each grade — for example, the number 6.0 would indicate average 
performance at the beginning of sixth grade; 6.5 would indicate average 
performance in the middle of the school year or grade. The tenths of grade 
placement for any testing date may be ascertained by an inspection of Table 



STATISTICS, NORMS, AND STANDARD SCORES I39 

12. Note that beneath the dates are tenths of the school year for students 
entering in September. For example, a student in the eighth grade during 
the period between February 16 and March 15 who is exactly average on a 
given test would have a grade equivalent of 8.6. That is, his performance is 
about equal to the typical student who has completed 0.6 of the eighth grade. 

Table 12 Grade Placement at Time of Testing 


■ 












■iiii 











Reproduced {romSlanford Achievemenl Tett Battery, Adz anted. Directions Jor 
Adtninisterins- Copyright 1964 by Harcourt, Brace & World, Inc. Reproduced by 
special permission of the publisher. 


Table 13 presents a hypothetical distribution of raw scores and their 
grade equivalents which Mould be similar to normative data presented in a 
test manual. Thus, if Ray Gold, an eighth-grade student, had a raw score of 
25, his grade equivalent or grade placement score would be 10.8. 


Table 13 Raw Scores and Their Grade Equivalents for the X Test of Social 
Studies for Junior High Students 


Rato Score Grade Equivalent Raw Score Grade Equivalent 


30 

12.5 

29 

12.0 

28 

11.8 

27 

n.6 

26 

11.2 

2S 

10.8 

24 

10.3 

23 

9.7 

22 

9.2 

21 

8.7 

20 

8.4 

19 

8.0 

18 

7.7 

17 

7.3 

16 

7.1 


15 

6.8 

14 

6.6 

13 

6.3 

12 

6.1 

11 

5.9 

10 

5.5 

9 

5.0 

8 

4.5 

7 

4.0 

6 

3.5 

5 

3.0 

4 

2.5 

3 

2.0 

2 

1.5 


I 



140 


testing the test 

Most measurement authorities today question the advisability of using 
grade norms, especially in reporting test results to parents. This is because 
they seem to be so simple that misunderstandings often result from their 
use. At first glance one is apt to interpret from our previous example ^ 
indicating that the student is advanced enough to work at a higher grade 
than he is actually able to do. This assumption could be entirely untrue. 
Thus, Ray Gold’s score indicates that he has obtained a score equal to the 
average score earned by children in the tenth grade. This may only n^an 
that he has mastered most of the work at or below his grade level. The 
average eighth-grader, on the other hand, will of course miss more of the items. 

Durost (1961b) illustrates quite well the basic point of our discussion 
when he states, 

' A fifth-grade child who has really learned to compute accurately 

"i may do ten straight computation examples without error, while the 
5 average child who has not mastered all his number combinations 

. or is unsure in borrowing or carrying will miss se\’eral of these 

j problems. The higher score earned by the first child will result in 
I his receiving a grade equivalent substantially beyond his grade 
) placement; yet he could not work at that level successfully because 
1 he has not been exposed to the new processes and learnings normally 
\ taught in the higher grade [p. 1]. 

Percentile Norms 

Percentiles are as easily understood as grade norms and do not suffer 
from the same limitations. A percentile norm rank indicates the proportion 
of students who fall below a given score. It does not mean the percentage 
of questions anszeered correctly. It means the percentage of people whose 
performance a student has equaled or surpassed. Thus, if 75 per cent of the 
students to whom Betty is bring compared score lower, she is at the 75th 
percentile. That is, on a given test Betty has done belter than 75 per cent 
of the students taking the test and 25 per cent have scored higher. 

Tables 14 and 15 present percentile norms for the Differential Aptitude 
Tests. The norms in these Tables are for tenth-grade students in their first 
semester (fall of the year). The test manual presents similar norm tables for 
boys and girls from grade eight through grade t^velve. Note that Table 14 
presents norms for boys and Table 15 gives norms for girls. The reference, 
or norm, group is especially important for accurate and meaningful test 
interpretations. The teacher or counselor inspecting these tables is im- 
mediately given data to assist him in knowing to whom he is comparing his 
students. 

An inspection of Tables 14 and 15 reveals raw scores under the various 
ttsts in the battery with percentile rankings for each score. Let us suppose 
we want to find the percentile rankings for two students on the abstract 



STATISTICS, NORMS, AND STANDARD SCORES 1^1 

reasoning test (Abst. Reas.). Both Susan and Wptt have a raw score of 42. 
Looking at the numbers under “Abst. Reas.” for boj-s wc find that a raw- 
score of 42 is equivalent to the 90th percentile. Wyatt’s percentile norm for 
“Abst. Reas.” is therefore at the 90th percentile level. Susan has the same 
raw score as Wyatt. Do you think her percentile level will therefore be the 
same? If Susan were a boy her percentile level would be the same. Susan 
is s girl, however, and we must, on this test, rampare her to othcMcnth- 
grade females. Table 14 presents norms for g.rls s.ra.lar to Sumo mt s, 
Ly were tested in the tenth grade during the first semester m the fall of the 
vea^ LooUng under “Abst. Reas.” wc find Susan s taw score of 42 equivalent 

More detailed tables would J bottom present percentiles in steps of 
15, except for the “P „„i,.alcnt to a single percentile. Though 

five. Several taw scores are j, sufficient in most ases. 

we could be mote exact, th s yp interpret and do 

Percentiles are extenswelyused^usc tlwy are ) 

not have the glaring ° o scores on a wide 

They have been “'in^rumentt such m intcfligcncc, aptitude, ach.cvc- 

varicty of measurement mstruTO^^^ ^ ^ ^ny poup. 

Se are, howev"’ ^ “"Xtag used » “’'reference point presents 
First of all, the poop that « ji,eussion of Wptt and 

some problems. This has bee the raw scotca and pertontde 

Susan’s test scores. 'The P ^ ^s noted. It is obwo^that we 
equitwients on the gtrls “^^‘’^^“S^rsassex, age, and grade. ■Re group 
need dilTetent norm groups fo j, ,he group to which he or 

to which you compare a pupi m ^ , „mpatc a college applicant 
she belongs. It is, for “““P the general high school population 

aeademierptitude scores totlwsto^f'y order 

His scores must be compared to tnosv 


^o nave ■" 

,r advanced degrf; . 






142 


nbh IS PmroineNotm. for Boys for the PillcenMl Aptitude Test, Fom L, Fall (Km semesur) 






144 


TESTING THE TEST 

to divide the norms according to educational level. In fact, it could be 
Smith might not make the officer corps because be was being compared only 
to coUege graduates. On the other hand, Jones might qualify for the corps 
because he was the highest in his group. Thus, normative groups must be 
relevant for the decisions to be made. 

There must be many sets of norms for a given test. The test user may then 
choose those most pertinent to his situation and needs. There are 
limits, however, to the number of norm groups that can be supplied by a 
test publisher. Schools, therefore, need to provide their “local norms to 
supplement the published percentile norms. The use of “local norms 
helps the school to determine the relative standing of pupils in its own 
system. In some situations this comparison is often more significant than 
the use of national norms. Let us look, for example, at three different high 
schools. 

Stuart High School is a secondary school located in a upper-middle- to 
lower-upper-class suburban community. The children who attend Stuart 
have been exposed to a great many cultural and educational advantages. On 
a standardized test of academic ability the average pupil’s score at Stuart is 
at the 75th percentile level (in the national population the average child is, 
of course, at the 50th percentile level). It is obvious, therefore, that Stuart 
must develop its own norms in order to have more meaningful comparisons 
and to place children in appropriate classes. 

On the other hand, Lincoln High School is a secondary school located in 
a large urban complex referred to as the “inner-city.” The average student 
at Lincoln on the same test of academic aptitude scores at the 35th percentile. 
Thus, Lincoln needs to develop local percentile norms for the same reasons 
as Stuart. 

Pearson High School is also located in a large metropolitan area. However, 
Pearson draws students from diverse socioeconomic backgrounds. The 
average student at Pearson scores at the 55th percentile level on the same 
academic aptitude test. For Pearson the national percentile norms supplied 
by the test publisher seem to be appropriate and a valid reference point. 

Another caution in the use of percentiles involves the interpretation of 
differences in percentile rankings. A student, for example, who ranks at the 
95lh percentile may get five or six more items correct than his classmate 
who is ranked at the 90lh percentile. The student, however, who is at the 
55th percentile may get only one or two more items correct than his friend 
who is at the 50th percentile. (See Figure 17.) 

Percentile units are unequal. Remember, an equal percentile difference 
docs not necessarily represent equal raw score differences. The inequality 
of percentile may be seen in Figure 11. Note alongside the “percentile 
equivalents the closcnes of percentile and the distance at both extremes. 
See, for example, how far apart the 95th and 99th percentile are, or the 1st 
and 5th percentile as compared to percentile in the 20th to 80th percentile 
range. 



STATISTICS, NORMS, AND STANDARD SCORES 

"“"RS ur= reference pom.e 
Mhich pronde us mth a basis for interpreting a score of an individual in 
' 'ion ‘o 'V* P'-™ group. It is important that the group is 

relevant to the individual and our purposes for comparison, ft is also im- 
portant to bear in mind that percentile units arc unequal and that at the 
extremes of the normal curve five percentile units arc not equis-aJent to five 
percentile units in the middle. 


Standard Scores 


In our previous discussion of raw scores the rationale for converting 
scores into agreed upon or standard units was presented. The use of standard 
scores is another method to provide comparability to the me.mlng of a raw- 
score. Thus, the tester in Rich Square, North Carolina, and the tester in 
Tokyo, Japan, would both have a similar frame of reference. I'his standard 
frame of reference is the normal curve. A standard score is based upon the 
number of standard deviations a given score is from the mean. Typical 
standard scores can be seen in Figure 11. Before continuing, turn back to 
the section on standard deviation and review- the concept. An understanding 
of the standard deviation is very helpful in attaining insight into the standard 
score mechanism. 

In Figure II, the first standard scores arc the z-scorcs, which arc equivalent 
to the standard deviations. Inspecting our normal curse m Figure 11 we 
can see that a standard z^scorc of +1 has between it and the mean 34 per 
cent of the cases. A standard score of +I is then equivalent to the S4th 
percentile .and a standard score of —I Is equivalent to the 16th percentile. 
This relationship is always the same iviih standard scores for any normal 
distribution. If we look further we can sec that —4 and +4 standard scores, 
arc equal distances below and above the mean. Thus, the standard score docs 
not suffer from (he same defect of unequal units as do the percentile equiva- 


lents. 

It should be noted that z-scores may be computed for any ijpc of dis- 
tribution by equating the mean to 0.00 and the standard doiaiion to 1.00. 
The z-score cquivalcnfs in Figure IJ are conea rndy for a norsnal 
tfibution. ... 

The use of z-scorcs presents some difficulties. I-irst of all, a standard score 
of zero !s incorrcclly imerpmed by romc to mean a .-cry poor pcrformaiice 
ralhcr lhan the mean or averajc performance. The z-5Core baa Iivo other 
disadvantasea in that half are neEalive mloe and roan.v mvot.c dectmal 
fracliona. To eliminate thcee a\ik«atd and iime-eonstiminp dtjadiaouiN. 
different standard score systems lia« been developed. An impection of 
Typical Standard Scores in Fiipire 1 1 noTals three other types of standard 
scores that do not sulTer from these dindvanta^. j i 

T-scores are e.vpressed in «hoIe numbers »ath a mean of 50 and a rundar J 




/•chtUr ScoUi 










STATISTICS, NOKMS, AND STANDAHD SCORES 147 

d^-iation of 10. A T-scote of 75 wuld bo equiraleot to a a-score of +2.S. 
factirar™" diminates negative nutnbere and decimal 

The CEEB (College Entrance Examiiution Board) has a mean of 500 and a 
Standard deviation of tOO. This eliminates both decimaJs and negative 
numbers. A high school senior, for example, who obtains a score of 600 on 
one of the tests would be in the 84th percentile. 

The AGCT (Army General Classification Tests) scores, as can be seen in 
Figure ll, have a rnean of 100 and a standard deviation of twenty points. 
This scale was developed during World War JI. The United States Navy, 
on the other hand, expresses its test results in T-scores. 

Sianines are another type of standard score; they were developed by the 
United States Air Force in World War 11. The word stanine was taken from 
“STAndard NINE-point scale” (Durost, 1961a). Thus, a stanine scale is a 
nine-point scale with a mean of 5 and a standard deviation of 2. The dis- 
tribution of stanines, as can be seen in Figure 11, is based upon the normal 
cun'c. Note that just below the stanines are percentages which indicate the 
per cent of the total found in each stanine. 

The stanine score is considered by many testing authorities as the preferred 
method of explaining test results to students and parents and is gaining wide 
acceptance In our schools today. This is because stanines are easily under- 
stood and are broader in scope than other devices, yet they are precise 
enough for the purpose of reporting test scores. It should also be noted that 
in the area of research, stanines are easy to use because they are one-digit 
numbers. When computers are used, they are economical as well because 
they require only one column to signify a score on a punch card. They 
immediately tell the test user the standing of the pupil. For evample, a student 
with stanines of 7, S, or 9 is far above average in whatever measure is being 
sought. On the other hand, stanines of 2 or 3 indicate he is well below 
average. Figure 12 presents a ladder of stanines with the percentage of 
children reaching each rung. 


Though the stanine scale is an excellent method of reporting test results it 
has technical limitations. Magnusson (1967) states, 

1 The T-scalc allows finer differentiation among individuals than 

the stanine scale. So long as a suffidemly high reliability justifies a 
stricter differentiation, we will lose a certain amount of information 
about the individuals by giving their rwults as stanine scores. For a 
reliability of 0.91 the standard error will be 0.3t, and for a reliability 
of 0.96 the standard error will be 0.2 j. For a T-scale where *=10, 
these figures indicate standard errors of 3 and 2 units respecti^-ely. 
The standard error is so small that the scale can be said to differentiate 
so accurately that otic would Jose wluable info^t.on ,f the results 
[ were to be given on a stanine stale instead of a T-scale [p. 2+lJ. 



148 


TESTING THE TEST 



Fij^c 12. Sunint ladder. (Reproduced from L. J. Karmel, Tisling in Our 
Sekoob. New- York: The Macmillan Company, 1906. Copyright © 1966 by 
Louis J. Karmel.) 


The reader svlio is interested in pursuing the techniques of stanine inter- 
pretation to parents and other lay persons should see Durost (19613). 
tngclhart (n.d.), and Karmel (1966). 

Ilie important thing to remember about standard scores is that they' are 
based on a normal distribution and should not be used for other data, 
bundard scores are meaningful only in relationship to a particular group. 
Ibe standard desiation of a particular reference group and an individual's 
^re IS represented as the number of standard deviation units from the 
can o 1 1 C group. Tlie nuln difference of the standard score over percentile 
rankings is that they arc presented in equal steps or units. 



STATISTICS, NORMS, AND STANDARD SCORES 149 

Intelligence Quotients 

A ereat deal of confusion, misunderstanding, and emotionalism has clouded 
the concept of the intelligence quotient. Basically, the .ntelligence quotient 
is no more than a formula for obtaining a type of score that tras found con- 
venient for classification of individuals. The classical formula isi 

Mental Age (MA) .. mn 
Intelligence quotient (IQ) = chronological Age (CA) 

Theoretically the intelligence quotient will be 100 for the average person. 
That is if a ten-year-old child does as well on an intelligence tKt as “'era^ 
«n-Pr-olds in Ae normative group, his mental age will be 10. Translated 
into the formula we have the following: 

100 X i? = 100 

If on the other hand, a ten-year-old does ayvell on an intelligence test as 
tL average twelve-year-old, we have the following. 

100 X 12 _ J20 
10 

Let ns look at one more example, that of a ten-year-old who has a lower 

mental age than his chronological years. 

* 80 

10 

Thus our first ten-year-old « f'*''',yj’''X«s°orr 3 ten-year-old 
ten-year-old is above average (IQ I 

is below average (IQ j.,e|„ped originally by Wilhelm Stem for 

The Intelligence j ■ "ince. (See Chapter 7.) The standard 

usewlththeBinettestofindividualmte ^ t^l ^ (Stanford-Binet) is 

deviation of the American v«amn I; ^Vocates of the IQ have stated 
,„v;n,atelv 16 With a mean of lUU- J . from age to age. 


approximately lata ‘relatively uniform from age to age. 

^ the standard deviatio 


that 


deviation 'f"”"’,'' " a thing, relative to age group, 
Thar'i;: that an IQ of Investigations and pm- 

intelligence quotient does no g 

for yllfferent ages (p- ^ 





testing the test 


150 

relative positions may be the same. Engelhart (1959) presented a system to 
obviate this problem. His system allows for the equating of IQ’s Of 
different group intelligence tests. The reader should bear in mind mat an 
IQ on X test of intelligence is not necessarily equal to an obtained IQ 
test of intelligence. It is possible, for example, that on X test the 
obtainable IQ would be 154 whereas on the Y test the highest obtainable lU 
might be 164. Again scores must be interpreted on the basis of relevant 
norms. The meaning of an IQ must be interpreted in the light of the test 
from which it was derived. 

The reader should not conclude from this discussion that IQ scorw 
derived from M A/CA are worthless. They are not and still may serve a valuable 
function. They give us some relative information and if the user bears in 
mind the standard deviation and range differences they can be useful tools. 

Another word of caution is indicated. Do not confuse the prerious dis- 
cussion of IQ with other methods of obtaining IQ scores. The Wechsler 
tests of intelligence report results in an IQ format. The method used to 
oimpute IQ in these tests, howev'er, is very different from the classical 
formula previously presented- The shortcomings of the IQ based on mental 
age have been eliminated. 

Let us look back at Figure IL At the bottom you will see the Wechslet 
Scales and below it the deviation IQ’s. You can see that your knowledge of 
standard scores will be useful in understanding these scores. A person's raw 
score on each of the subtests is translated, using relevant norms, to a standard 
score. This standard score, as you can see, b based on a mean of 10 and a 
standard deviation of 3. The total standard scores (verbal scale, performance 
scale, and full scale — see Chapter 7) are then converted into IQ’s. An in* 
spection of Figure 11 show’s that these IQ’s are based on a mean of 100 and a 
standard deviation of fifteen points. Thus, roughly 68 per cent of the IQ’s 
are between il standard deviation. Among testing people this type of 
estimation of IQ b known as the deviation IQ. 

It should be noted that the deviation IQ does not differ from other 
standard score procedures. A given IQ in the distribution will alwaj^s have 
the same relative position. A final note on this subject is of particular im- 
portance. The 1960 revision of the Stanford-Binet substituted the deviation 
IQ for the mental age concept. As you wUl recall, the formula of 100 MA/CA 
w-as onginally constructed for use with the Binet. Thus, the obvious trend 
standard score whether we call it IQ or T-scorc. 
The Standards for Educational and Psychological Tests and Manuals (American 
Psychological Association ct al., 1966) slates the following: 

j Standard scores should in general be used in preference to other 

dcrn-cd scores. The system of sUndard scores should be consistent 
wth the purposes for which the lest is intended, and should be 
described m dcuil in the test manual. The reasons for choosing that 
L «alcm preference to other scales should also be made clear [pp. 33-34]. 



STATISTICS, NORMS, AND STANDARD SCORES 

Practical Usage of Norms 


152 


There are two essential things to remember when trying to understand 
individuals by testing them. The first is the appropriateness of the test for 
the peison and your basic purpose in testing; the second is to know how- 
others have perform^ on the test. The "best" and most appropriate test 
that IS valid and reliable will yield meanmgless scores unless it is compared 
with other scores (Seashore and Ricks, 1950). Norms provide us with a 
frame of reference, for comparison. 

Norms are only meaningful if they are relevant. It does little good to 
compare the quality of apples with that of oranges. Similarly, when Mary 
is compared to a normative group it should be composed of individuals of 
similar backgrounds. Standards (American Psychological Association et al., 
1966), for example, cites as an essential requirement that, "Norms presented 
in the test manual should refer to defined and clearly described populations. 
These populations should be (he groups to whom users of the test will 
ordinarily wish to compare the persons tested*' (p. 35). In illustrating this 
essential requirement. Standards goes on to state, 


General aptitude tests designed for use with elementary school 
children might well present norms by grade-groups and by chrono- 
logical age groups .... The manual should point out that a person 
who has a high degree of interest in a curriculum or occupation tvill 
generally have a much lower degree of interest when compared with 
persons actually engaged in that field. Thus a high percentile score 
on a scale reflecting mechanical interest, in which the examinee is com- 
pared with mcn-m-general, may be equivalent to a hto percentile when 
the examinee is compared with auto mechanics [p. 35J. 


Teachers and counselors are often called upon to advise students about 
college and their chances for admission. Sound and meaningful advice is 
based on several factors, such as the student's high school grades, college 
admission test scores, and the academic caliber of students at the desired 
college. In a sense most of these arc norms. That is, comparison of rcle%-ant 
hetots of the student with relevant factors of the college. Put these together 
and we have meaningful comparisons. Look at Tables 16 and 17 and note 

the differences in median scores in each table. 

Tabic 16 presents a general population of high school seniors, xabic 17, 
on the other hand, is a selected group composed of high school seniors 

planning to go on to college. ^ ^ 

If a student had standard scores of 18 m English, 16 in mathematics, 
17 in social studies, and 19 in natural science, he would be avenge or above 
(60th percentile, 57th percentile, 51st percentile. 64lh percentile) compared 
to the "unstleaed” seniors foond in Toble 16. If. on the other fiend w;e 
look at Table 17 we find a dramatically differait picture. Comparing this 



152 


TESTING THE TEST 


Table 16 Percentile Ranks of Unselected High School Seniors* 















STATISTICS, NORMS, AND STANDARD SCORES 

Table 17 Percentile Rants of Colfege-Bound High School Seniors* 






154 


TESTING THE TEST 

same student to “college-bound” seniors, wc find he is a grrat deal below 
average (34th percentile, 28th percentile, 28th percentile, 39th percentile). 

Thus, in guiding a youngster or adult we must have appropriate norms. 
Using “unselected high school seniors” as our frame of reference, we would 
probably advise college or at least a junior college for our student. If "C 
compare him to the appropriate norms, “college-bound high school seniors, 
our suggestions would be quite different. 

One more illustration of the importance of relevant norms ran be seen in 
Tables 18 and 19. Table 18 presents norms for high school seniors who later 
attended junior colleges or technical schools. Table 19, on the other han , 
is made up of high school seniors who enrolled at institutions which offer 
bachelor’s, master’s, and doctor’s degrees; these institutions are primanl) 
large public and private universities (The American College Testing Pr®* 
gram, 1965). 

An inspection of Tables 18 and 19 reveals differences in median scores. 
Again we see that the normative or reference group is extremely important. 

Guidelines. Good norms are based upon representative and random 
samplings of the population for which the test has been constructed. The 
sheer quantity of the sample does not mean the norms are appropriate. 
A test, for example, based on 200,000 pupils in New York City would 
probably be good for New York, but would not necessarily be relevTint for 
other geographic regions. Thus, in your review of a test manual do not 
accept alleged national norms unless they are supported by a cogent and 
complete analysis of the sample of people they represent. Expect and look 
for specific relevant evidence to support the test author’s claim for representa- 
tive samplings. 

School systems and other users of test data should develop their own 
local norms. These local norms should be revised periodically. This is 
especially relevant in the use of achievement tests. “Local norms are more 
important for many uses of tests than are published norms” (American 
Psychological Association et al, 1966, p. 34). Many test manuals describe 
the process of computing local norms. These local norms may be calculated 
by the same procedure used in determining national norms. 

A list of some of the essential normative data that you should look for in a 
test manual follows :2 


1. Scales used for reporting scores should be thoroughly docu- 
mented and explained in the test manual in order to facilitate test 
interpretation. 

2. Standard scorts generally should be used for reporting raw 
scores. 

3. Tables for converting grade norms to standard scores or pet' 
centiie ranks should be provided. 


’ Based on Standards (American Psj-chologieal Association et al., 1966 ). 


STATISTICS, NORMS, AND STANDARD SCORES 


155 


Table 18 Percentile Ranks for CoUege Freshmen Enrolled in Junior Colleges 
and Technical Schools* 







testing the test 

l56 

Table 19 Percentile Ranks for College Freshmen Enrolled in Doctoral-Gtanting 


Institutions* 



Gorefot. © 1965 by American Collcce Testing Program. 
■ nc. used by perm, ...on of .he American College Testing Progtim, Inc. 









STATISTICS, NORMS, AND STANDARD SCORES ^57 

4. Norms should in most cases be published in the test manual 
at the time of distribution. 

5. Standard scores or percentile ranks should reflect the distribution 
of scores m an appropriate reference group. 

6. Normative groups should be dearly defined and described. 

7. Method of sampling should be reported. 

8. Achievement test norms should report number of schools as 
well as number of students tested. 

9. Score variance because of such variables as age, sex, and educa* 
tion should be reported. 


References 

American Psychological Association, American Educational Research Association, 
and National Council on Measurement in Education. Standards jar educational 
mid psychological tests and manuals. Washington, D.C. : American Psychological 
Association, 1966. 

Chase, C. I. Elementary statisticai procedures. New York: McGraw-Hill, 1967. 

Durost, W, N. The characteristics, use, and computation of stanines. Test Scr\nce 
Notebook, No. 23. New York; Harcourt, Brace and World, 1962. (a) 

Durost, W. N. Hoa to tell parents about standardised tett results. Test Sen ice 
Notebook, No. 26. New York; Harcourt, Brace and World, 2962. (b) 

Cngelhart, M. D. Obtaining comparabJe scores on two or more tests. Educational 
and PsyehologUal Measurement, 1959, 19,55-64. 

Engelhart, M. D. Using stanines in interpreting test scores. Test Sendee Notebook, 
No. 28. New York: Test Department, Harcourt, Brace and World (n.d.) 

Gourcvitch, V. Statistical methods: A problem’Solcing approach. Boston: Allyn and 
Bacon, 1965. j » 

Guilford, J, P. Fundamental statistics in psychology and education. (4th cd.) Aetv 
York; McGraw-Hill, 1965, 

Karmel, L. J. Testing in our schools. New York: MacmtJlan. 2966. 

Magnusson, D. Test theory. Reading, Mass.: Addison-AVolcj-. 2967. 

Popham, W. J. Educational statistics: Use and snlerpretatum. New ^ork: Harper 

SeShmc,’H^ g'. and Ricks, J. H. Ncm, te Test Scmcc Bullcun, 

No 39 Neiv York* The Psychological Corporation, 195U. . 

t”™;, L M., ^ M. A. 

• th, Ihirj rmsim. Form L-M. Boslon: Houghton MiSin, I960. 



CHAPTER 


6 

Sources of Test Information 


Teachers and other school personnel need to know about tests — where to 
find test information, where to find critiques of tests, where to find testing 
books for parents, and so on. This chapter will attempt to provide this, 
other information. In addition, test resource materials will be listed. 


Special Kesources 

\\c shall discuss three of the most widely used resources in testingt 
SlandarJs/ar Educational and Psychological Tests and Manuals, The Mental 
Measurements i earhooks, and Tests in Print. These references arc the basis 
for definitive test information in the field of measurement. 

Standards for Educational and Psychological Tests and Manuals 

Standards is a forty-page booklet published by the American Psycho- 
ogia ; s^ociation ct al. (1966) and has been frequently mentioned in this 
. * ‘‘"c >s little need to elaborate its essential characteristics at this 
point. It n important, howes-cr. to state that Standards represents the 
158 


SOUKCES OF ■raST mFO]l^UTION 15, 

1 eductors and as such 

Though there are no legal requirements to make one adhere to thL "rules,” 

jS-toii/or* quite seriously. Every school should have a copy in its testing 
librarj and should make use of it when e\‘aluating a test. 


The Afental A/easi/rements Yearhoohs 

The Mtasitttment Ytarbooh, edited hy Oscar K. Buros a938 

1940, 1949, 1953, 1959, 1965), are the most valuable and important single 
source of information about tests. The Yearbooks contain detailed information 
and critical review on thousands of tests. An attempt is made to list, discuss, 
and criticize every published standardized test. 

The critical reviews arc written by experts in the general field of measure- 
ment and in the specific area of hisor herspecial competency. Thus, a measure- 
ment specialist in school ability and achievement testing would review tests 
in that particular area only. The number of reviews for each test is based on 
the interest which that area of testing generates and the e.vtensiveness of the 
test’s utilization. Some tests of uidc general interest and use, therefore, may 
be reviewed by two or more experts. The reviewers provide frank, detailed, 
and critical information about each test. The following is an example of 
the type of review contained in the Yearbooks; it is excerpted from a review 
by Dr. Jonathon C. McLendon, one of many reviewers of the Sequential 
Tests of Educational Progress (STEP). McL^don (1965) states, 


The STEP tests in social studies continue without peer, indeed 
almost without available counterparts, as (he leading standardized 
scries of skill tests in social studies. As previous reviewers have 
indicated, the STEP tests fulfill a distinctive need in social studies, a 
field in which testa have generally dealt mostly or only with know- 
ledge and understanding of facts and concepts. 

Content validity of the STEP tests is dependent on the soundness 
of judgment of those three dozen persons who participated in the 
test construction. While this group included several outstanding 
teachers and other leaders in social studies education, additional 
evidences of content validity would be welcome. In light of the heavy 
emphasis that teachers place on intepretation of reading materials in 
social studies, content validity is weakened by the e.xtent (37 to 51 
per cent) of items that involve interpretation of visual materials. 
Data on item validity are not reported. Construct, concurrent, and 
predictive validity are evident only by implication. Correlations with 
SCAT scores are interesting, and useful in some ways; but more 
closely related criteria would serve better to guide teachers and 
sludents m interpreting and applying test results. Ideally, the reportmg 



160 


testing the test 


n 


I 


I 

\ 


of scores would facilitate recognition of levels of achievement y 
individuals and groups in the use of particular skills involving 
specified types of instructional materials or sources. ^ 

Although the STEP tests aim chiefly to measure indicated abilitiw, 
previous knowledge concerning the subject matter presented on the 
test doubtless aids many test takers. The seven types of skills ana 
eight areas of understanding listed in the Manual for Interpreting 
Scores on the social studies tests provide no more than general and 
somewhat vague identifications of related behaviors; the stat^ements o 
understandings appear to restate several proposed in 1957 by t e 
Committee on Concepts and Values, National Council for the Socia 
Studies. Hopefully the publishers of the STEP tests will be able to 
furnish in the foreseeable future, as promised seven years ago m 
their 1957 Technical Report, “empirical checks .... relating test 
scores to suitable criterion measures,*’ which data have not yet 
appeared in the SCAT-STEP Supplements [p. 1224]. 


In addition to test reviews the Yearbooks provide other important an 
practical items of information such as the test author; test publisher; norms, 
for example, appropriate grade and/or age levels; prices of tests; publication 
and revision dates; administration time; and available forms. In addition, 
references are listed. 

Every school or other institution using tests should have at least the last 
three editions of the Yearbooks. The first source that school testing people 
and testing specialists consult for testing information is the latest edition of 
the Yearbooks. A new test is usually reviewed in the first Yearbook that ''"as 
published after the test was distributed for popular use. Some tests are con- 
tinually reviewed in subsequent Yearbooks; others are not. Thus, for an 
exhaustive and detailed account one should have all the Yearbooks; for most 
school purposes, howe^'er, the last three Yearbooks should suffice. 


Tests in Print 

Another book that is useful and should be in every school is Tests in Prini 
(Buros, 1961). It contains all the tests that one would find in the Yearbooks, 
test catalogues, journals, books, and other sources. The list includes over 
2,0^ tests that arc in print as well as over 800 tests that are no longer available. 
This book serves primarily as a reference guide to enable you to find out where 
to obtain critical and detailed data on each listed test. In addition to references 
for further investigation it presents data on each test— for example, author, 
publisher, norms, forms, and any special features. 

Texts and Reference Books 

No tt^lc textbook can adequately cover in detail all the special areas of 
testing. Certain texts arc written for special areas of testing or for a special 



161 


SOURCES OF TEST INFORMATION 


audience. The following books have been selected as sources for more detailed 
.nvestigations and study, ft should be noted that the list is noTeSiSte 


Intelligence Tests 


Williams and Willjins, 1938. Pnmanly centered on the author’s intelliEence test 
Useful concept about the nature of inteHigence and IQ. Details of test construc- 
tion and classification of intelligence as well as diagnostic and practical aDOlica- 
tions of IQ results. " 


V 0/ inttlligence by draivings. (copj-right r«ieived 

19S4) New York; Harcourt, Brace and World, 1926. Description of method for 
measuring intelligence by the use of drawings. Especially useful for primary 
teachers and counselors. 


Tcrman, L.JM., and Merrill, M.A. Stanjord-Binet Intelligence Scale: Manual for 
the third revision. Form L-Af. Boston; Houghton Mifflin, I960, Essentially the 
manual for the Binct but gives the reader some knowledge of thorough test 
construction and a detailed idea of what IQ is. 


School Tests 

2lg,F.L.,and Ames, L. B. School readiness .-Behacior tests used at the Gesell Institute. 
New York: Harper & Row, 1965. Combined text and manual presents Oesell 
Institute’s view that children should be enrolled in school on the basis of develop- 
ment, not on chronological age or IQ. Description of tests used and how to 
administer and use results. One section of book is devoted to teachers, adminis- 
trators, and parents. 

Bauernfeind, R. H. Building a school testing program. Boston; Houghton Mifflin, 
1963. Review of some basic test concepts plus detailed outline for testing pro- 
grams from K-12. 


Specialized Psychological Testing 

Welsh, G, S., and Dahlstrom, W. G. Basie readings on the MMPI in psychology and 
medicine. ISIinneapolis; University of Minnesota Press, 19S6. The editors of this 
book present an excellent source of technical information on the various uses of 
the Minnesota Multiphasic Pereonality Inventory (MMPI) by leading psycho- 

Bu'reemeUtcr, B. B. Piyrfofoffm/ Uihmqua m .Krclagical Japam. New Yerk: 
Harper & Row, 1962. Presentation of psychotogicaf techniques in the diagnosis 
of cmtral nervous system disorders. Reveals the tools of the psychologist m 
detecting these disturbances. 

Levy, L. H. Psychohrica: intcrpnimion. New York; Holt, Rinehart and \\ mston, 
1963. A book dealing with the nature of psychological int^reBt.on and how the 
psychologist uses tools such as tests to derive hts analysis of behavior. 

Guidance Counseling „ e 

Goldman, L. Wag lest, m eoaoseS,.. 

frit excellent guide to interpretation, test seleotton and r^earch finding m 

^as“raeS.!ntended to help school counselor in thensesof tests, neo^^^^^^^ 



testing the test 

162 

B=rdie R. F., Layton, W. L.. Swanaon. E. O.. and Hagenah, T Te^tij 

a.d'co^eUng. New York: McGraw-Hill, 1963. A ‘“‘devoted to he^P>"g 
eonnselors understand pupds using tests as only one rnethod m this task Oh 
factors, such as family background and phy-sical health, are considered g 
with tests to give a complete picture of the student. 


Vocational Testing 

Super, E. E., and Crites, J. O. Appraising vocational fitness. New York: Harper ' 
Row, 1962. A compilation and evaluation of useful tests for the identification 
vocational aptitudes and skills. The book discusses each test m terms o i 
applicability, contents, administration and scoring, norms, and so forth. In i- 
vidual case histories and their eventual disposition are also given. 

Miscellaneous 

Allen, R. M., and Jefferson, T. W, Psychological evaluation of the cerebral _ 
person: Intellectual, personality, and vocational applications. Springfield, I 
Charles C. Thomas, Publisher, 1962. Significant tests for use with cerebral- 
palsied persons are outlined and discussed. 

Albright, L. E., Glennon, J. R., and Smith, W. J. The use of psychological tests tn 
industry. Cleveland: Howard Allen, Inc., Publishers, 1963. Primarily des'oted to 
selection problems, especially useful as a reference for personnel officers. 
Evaluation of pupil progress in business education. American Business Education 
Yearbook, Vol. 17, i960. New York: New York University Campus StorcSi 
1960. Especially useful for testing students in business education. , 

Ismail, A. H., and Gruber, J. J. Integrated development of motor aptitudes an 
intellectual performance. Columbus, Ohio: Charles E. Merrill, 1967. An excellen 
source book for motor aptitude tests and their relationship to intellectua 
functioning. 

Clark, H. H. Application of measurement to health and physical education. (4th ed.) 
Englewood Cliffs, N.J.; Prcnticc-Hall, 1967. Detailed description of \'arious 
performance tests in physical education; papcr-and-pencil tests for sports know- 
ledge and health education. A measurement classic in the field of physical 
education. 

'Hie reader is again cautioned to bear in mind that the preceding Hst is 
not exhaustive and is intended only as a further guide to source material lO 
specific areas of testing. Moreover, some of these references may no longer 
be pertinent when you consult them. Do not reject a new test or any test 
because it is not listed in this book or one of the references. Remember that 
authors arc selective and present tests they think arc valuable. In the final 
analysis it is your job to select tests appropriate for your needs. 


Test Publishers 

One of the iTKKt important sources of test information, especially about a 
specific tot. IS the publisher. (Sec Appendix A for a complete list of these 



163 


SOURCES OF TEST INFORMATION 

publishers Md their addresses.) The test publisher, upon request, mU send 
a catalogue of his tests that lists pertinent data such as price, norms, time 
necessary to administer a given test, and number of available forms. Specific 
and detailed information may be obtained by requesting a “specimen set” 
from the publisher. The “specimen set” is generally quite inexpensive or 
given, free of charge. It usually includes the test, answer sheet, scoring 
stencil, directions for administering and scaring, and manual. 

The “specimen set" facilitates evaluation of the test fay first-hand in- 
spection. The manual usually contains data on how the test was constructed 
and standardized, as well as norms and interpretive suggestions. It is the 
most contemporary source of test information, but of course one must be 
judicious in evaluating its contents. It is asking too much to expect the test 
publisher to be unbiased in his reporting of the test's limitations and assets. 
Test publishers can be very useful as a source of information if used together 
with your own critical evaluations and those of testing experts, like those 
found in the Yearbooks. 


Many test publishers in addition to selling tests also distribute advisory 
information concerning the whole field of testing. This information is usually 
written in an easy to understand way and is geared to the practicing school 
teacher and counselor or anyone else who uses tests in day-to-day practice. 
These advisory services are almost always objective and unbiased reports 
of tests and factors affecting tests in terms of construction and use. These 
publications that deal with commonly encountered measurement problems 
are available free of charge. Three of the most active of these publishers are 
listed: 


1. The Psychological Corporation, 304 East 4Sth Street, New York, 
New York 10017. The Psychological Corporation publishes from 
time fo time the TVrr Serene Bulktin. These bulletins generally 
contain a three-to-five-page article on some facet of tests or testing. 
There are currently over twenty-two bulletins available to teachers, 
students and schools. Examples of some of these bulletins are the 
following: No. 36, "What Is an Aptitude?"; No. 38. “Expectancy 
Tables— A Way of Interpreting Test Validity"; No. 39, “Norms 
Must Be Relevant"; No. 54. "On Tellmg Parents About Test 
Results"- No 55, “The Identification of the Gifted"; No. 56, 
“Double-Entry Expectancy Tables"; No. 57, "Testing Job Applicants 
from Disadvantaged Groups," Some of the articles are bound to- 


gether in one bulletin, y nocjn 

2. Educational Testing Service, Princeton, New Jersy, OS540. 
This Dublisher produces a series entiUed Evaluation and Advisory 
Service Series At present this series indudes four booklets. They are; 
STT "touting Information on Educational Measurement: Sources 

jn’f Vn 3 “Selecting an Achievement Test: Principles 

p“s"- S:: 4, "She Ctasvoo. Tes.: A Guide foi 



164 


testing the test 


Teachers”: Ko. 5, "Short-cut Statistics for Teacher-made TesB 
In addition to these, several other booklets and materials are provi e 
free of charge to students, teachers, and schools. 

3. Harcourt, Brace and World, Inc., 757 Third Avenue, 

New York 10017. This publisher produces two series of t«t a - 
visor>' publications. Test Service Notebook ind Test Service BotUtto. 
The notebooks are generally four pages long and focus on su je 
related to test theorj*, administration of testing programs, ^ ° 
research studies, and correct test usage. Examples of some of tlie 
currently available Notebooks are No. 11, "A Comparison of Results 
of Three Intelligence Tests”; No. 13, “A Glossary of 100 Me^c- 
ment Terms”; No. 17, “Why Do We Test Your ChUdren? : 

20, “Testing in the Secondary School”; No. 23, “The Characterful, 
Use and Computation of Stanines”; No. 25, “How Is a Test Bunt. . 
No. 27, "Fundamentals of Testing: For Parents, School Boards, an 


Teachers.” 

The Test Service Bulletin generally offers brief reports on 
effective testing programs and discussions of testing concepts.^^" 
araples of some of the currently available bulletins are No. 77, ‘ The 
Intelligence Quotient”; No. 79, "Misconceptions About Intelligenw 
Testing”; No. 91, "Finding Mathematics and Science Talent in the 
Junior High School”; No. 94, "Aptitude and .Achievement Measures 
in Predicting High School Academic Success”; No. 95, "Testing- 
Tool for Curriculum Development”; No. 99, "Selection and Prorision 
of Testing Materials”; No. 102, "Test Administration Guide.” 


Journals 

The Yearbooks though very valuable are only published every fb'e or sff 
years. This means that a more contemporary source of test o-aluation JS 
needed. One such source is the professional journals that re\’iew new tests. 
The following journals are of particular interest to test users: 

1. Personnel and Guidance Journal. 

2. Educational and Psychological Measurement. 

3. Journal of Consulting Psychology. 

4. Personnel Psychology. 

5. liemezs of Educational Research. 

6. Journal of Educational Psychology. 

7. Measurement and Evaluation in Guidance. 

8. Journal of Educational Measurement. 

9. Journal of Applied Psychology. 

10. Journal of Clinical Psychology. 

11. Journal of Counseling Psychology. 



SOURCES OF TEST INFORMATION 

14. Psychology in the Schools. 

15. Psychological Bulletin. 

16. Psychometrika. 

17. American Efiucational Research Journal. 

18. Journal of Expetimenial Education. 

19. Contemporary Psychology. 


Psychological Abstracts and Educational Index 

The Psychological Abstracts and the Educational Index are excellent biblio- 
graphic references in their respeaive fields. The Psychological Abstracts serve 
as the general guidelines for all psychological journals. Each issue contains 
a brief (abstract) summary of every report in the psychological journals, 
including the subject and ienportaat features of the report. Every year a 
subject and author index is given for convenience in locating material 
published in the previous year. The Psychological Abstracts also publish 
monthly listings of new tests, fn addition, data concerning research use of 
tests and resultant findings are also presented. 

The Educational Index provides a very wide listing of journal articles in the 
field of education. It includes lay as well as professional materials and 
discussions. It does not give analyses as detailed as those in the Psychological 
Ahstraets. Only references are given. One could find, for example, under 
“achievement testing” lists of tests and articles dealing with this area or, for 
that matter, any other sphere of measurement relating to education. 

The intelligent use of tests demands the utilization of all appropriate 
sources of test data. The combination of Psychological Abstracts and the 
Educational Index, plus the sources previously mentioned, should provide 
the test user with the necessary data to make intelligent choices in his evalua- 
tion and selection of measurement instruments. 


References 

American Psychological Association. American Educational Research Association, 
and National Council on Jlcasurcmenl in Edticatioti, cdncn/tnnnl 

and p^ychohglcnl t,tl: and wannaU. Washington, D.C.t American Psychologtcal 

Association, Inc., 1966, , , t. • i xr t . 

Buros. O. K. (Ed.) The 1938 menial measurmtnls yearbook. New Brunswick, N.J.. 

Rutgers University Press, I9IS. . . t> • t 

Buros, O. K. (Ed.) The nineteen forty mental masuremenU yearbook, ^ew Brunswick, 

Burol’ 0^ K® (Ed!) linw mm'"' m/nsntmmtt yeorfcoA. New Brunswick, N.J. : 
Rutgers Uneversitj’ Press, 1949. 


166 


testing the test 

Buros, O. K. (Ed.) The fourth mental measurements yearbook. Highland Park. K). 

Bi^^Jo°’ll'^.)Thefi/th mental measurements yearbook. Highland Park, N.J- 

Bu?^0. K. (Ed.) The sixth mental measurements yearbook. Highland Park, N-J- 

MdS^don, J.C.SequentUl tests of educational f™? pjjl N J^i 

Buros (Ed.), The sixth mental measurements yearbook. Hjghbna r , 
Gryphon Press, 1965. 



•ART THREE 

Individual Tests 
of Intelligence 
and Personality 


CHAPTER 

7 

Individual Tests 
of Intelligence 


“The Greeks had a word for it, but the Romans had a word with better 
survival properties. Regardless of the word, what is now called intelligence 
has been talked about for at least 2.000 years. And as long as 2,000 years 
before the advent of attempts to measure intelligence, there seems to have 
been recognition of the fact that individuals differ in intellectua! ability” 
(McNemar, 1966, p. 180). 

The teacher and counselor will rarely administer and interpret individual 
tests of intelligence such as the Stanford*Binet. He may, however, use these 
test results in many ways to facilitate learning in his classroom. In addition, 
group tests of intelligence (more appropriately termed schotoiU'c aptitude 
tesls) ivhfch he mil administer, aujd io some cases score/ are based in large 
measure on the validity of the individual test of intelligence. 

In this chapter we will discuss, in detail, two of the most widely used 
individual tests of intelligence, the Stanford-Binet and the Wechsler series 
(WPPSI, Wise, WAIS) as wcJ! as nonlanguagc, culture-fair, and infant 
tests. In Chapter 2 we discussed the problem of defining intelligence. You 
will remember that intelligence is defined differerJily by various psychol- 
ogists and that the term intelligence has no absolute meaning. The meaning 
of IQ was also discussed in general terms and later in Chapter 5 the mathe- 
matical derivation was presented. Before reading further it may be beneficial 
for you to review relevant sections of Chapters 2 and 5. 


169 


170 INDI%'IDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

Stanford-Binet Intelligence Scale 


History 


■ French psychologist Alfred Binct became interested in 


In about 1890 the Irencn — — - , 

investigating reasoning and judgment. He wanted to know the >«ys m 
“smart” and “dull” children differed. In his attempts to study these dmrt 
ences he used many t>'pes of measures, including size of cranium, ta - 
discrimination, and dipt recall. These measures produced little relations p 
to general mental functioning, , 

In 19(H Dr. Binct was appointed to a commission which was to study a ^ 
recommend to the educational leaders of Paris a procedure for 
which children were unable to profit from a regular school setting, 
commission was interested in picking out pupils who were in the men ^ ) 
retarded range and placing them in a school which \s’Ould provide instrucLO. 
at thar level of abiliU’. 

Binet w^ asked to produce a method to distinguish these retarded ^i* 
from “normal” pupils. His first scale, published in 1905 in collaboration wi 
Simon, dre^v on the knowledge gained in his earlier studies. This ^ ^ 
called the 1905 scale, was designed to cover various functions which Bin 
considered components of intelligence. These included comprehension' 
reasoning, and judgment. Children between three and six years of 
called upon to give their names, copy figures, point to their right and 
ear, and obe)’ simple commands, ^me of the tasks older children 
asked to do were to name the months of the year, make up sentences, denno 
abstract words, and name various coins. 

In 1908 and 1911 ^e^■isions of the 1905 scale were published. Chaun^ 
and Dobbin (1966) in discussing Binet’s work made an important obsei^’^r*®^ 
not only on Binet’s proadures but on intelligence tests in general. They state. 


No test or technique measures mental ability directly. ^Vhat Bi® 
did, and what all other “intelligence test” builders after him have 
done, was to set up some tasks for the young intellect to attack 
then to obsene what happened when the intellect was put to wor 
on them. His method was truly scientific and remarkably like ^ 
method used by phj-sidsts forty years bter to detect and measure tbe 
forces released by the atom. The cloud chamber does not permit the 
physicist to see the atom or its electricallv charged components, b 
it does reveal the trac^ of ionizing particles and thus permits^ ® 
scientist to deduce the nature of the atom from which the particles 
cmariaie [p. 5]. 


Lwis M. Terman, an .American psychologist, began to experiment 'ci 
Je Bmet tests in 1910. In 1916 he produced the Stanford Re^^s^on of the 
Binet Scale (Terman, 1916), This revision attempted to provide standards o 


INDIVIDUAL TESTS OF INTELUGENCE ,jj 

intellectual performance for "average” Amenan-harn children from three 
to sixteen. Intelligence ratings were arrived at by mental age scores. Terman 
increased the number of tasks from Binet’s original 54 to 90. The 1916 scale, 
for the first time, included detailed instructions for administering and 
scoring each subtest. ® 

The 1916 scale was used for clinicaJ diagnosis and research purposes 
during the 1920’s and early 1930’s. It was found that certain tests had low 
validity and that below the mental age of four and at the young adult levels 
the sampling had been inadequate. In addition, instructions and scoring 
lacked the precision needed for objective appraisal- In order to eliminate 
these faults a second revision of the Stanford scale was produced. 

The second revision, published in 1937, utilized the results of past studies, 
personal experiences, a ten-year research and standardization project. Dr. 
Maud A. Merrill coauthored the 1937 revision with Terman {Terman and 
Merrill, 1937). This revision retained many of the characteristics of the 
earlier tests, such as age standards. It provided a broader sampling but 
remained a test of ^'general inte/figencc" rather than a test of specific feinds 
of abilities. Two Forms — Form L and Form M-~were used. In terms of 
sampling and statistical techniques the 1937 scale was much more 
sophisticated. 

The third and latest revision. Form L-M, was published in 1960. This 
revision combines the best features and subtests of the 1937 scale into a 
single form. The most radical change in the 1960 scale is in the IQ 
tables, which give deviation, or standard score, IQ’s. This is a departure 
from the previous method of MA/CA x 100. (See Chapter S.) The revised 
IQ is a standard score with a mean of 100 and a standard deviation of 16 
(Terman and Merrill, 1960). 


Characteristics 

The Stanford-Binct Intelligence Scale begins with tests for the average 
two-year-old and progresses to levels that differentiate between average and 
superior adults. In order to illustrate the actual content of the test, c.tcerpts 
from four different age levels are presented with brief e.vplanarjons.> 


YEAR TWO 

1. Three-Hole Form Board^ _ 

(Material: Form board S in. x 8 in. with three insets for circle, 

square, and triangle.) . l , . , i • 

Procedure: The board is presented with the blocks in place. 


. , ■ Ilf T* nnrf Maud A. Merrill. Slaidord-Binet Intilhgtnce Sc^e: 

Lewis Af. form L-.V. Copyright, C> 1960. by ffouchion Mifllm 

\fawu2l/orthe T/iird permisswn of Houghton Mifflin Company. 

ush is used at two or 

nore age levels. 



INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

The Examiner^ tells the child to watch him and he proceeds to 
remove the blocks placing each on the table before its appropnate 
recess on the side toward the child. He then says, “Now put them 
back into their holes.” . 

Score : To receive credit, child must place all three blocks correctly 
in one of two trials. 

2. Delayed Response 

(Material: Three small pasteboard boxes and a small toy cat.) 
Procedure : E: "Look, I am going to hide the kitty and then see i 
you can find it again.” . 

•Score : The child watches the E place the kitty under each box ^ 
is asked to find the kitty. If on any of the three trials the chi 
turns over more than one box that trial is scored as a minus. 

3. Identifying Parts of the Body* 

Procedure: E: “Show me the dolly’s hair.” (“mouth,” etc. 
paper doll.) 

Score: The child must identify the parts on the paper doll. 

4. Block Building: Tower 

Procedure: Twelve Linch cubes are placed in erratic order before 
the child. The E proceeds to build a four block tower sa)nngi 
“See what I’m making!” He then pushes the rest of the blocks 
toward the child, saying, "You make one like this.” 

Score: Child must build a four or more block tower in response 
to E's request. 

5. Picture Vocabulary* 

Procedure: “What’s this? What do you call it?” (Eighteen 2 in. X 
4 in. cards with pictures of common objects.) 

Score: Recognition, e.g., plane, telephone, etc. 

6. Word Combinations* 

Procedure : Notation of child’s spontaneous word combinations 
during interview. 

Score: Combinations of at least two words. For example, "Mama 
bye bye” or “all gone” are scored plus while one-word combinations 
such as bye bye” or "night-night” are scored minus. 


YEAR SIX ^ , 

1. Vocabulary 

Procedure: “When I say a word, you tell me what it means. 

IS an orange?” (44 words— E stops after six consecutive "or 
have been failed.) 

Score: Importance of knowing meaning, e.g., orange— 
wrrect answers are: "a fruit,” “like a tangerine.” “it’s round, 
ajello.” ^ 

* From this point on. Examiner will be 


designated by the letter E. 



INDIVroUAL TESTS OF INTELLIGENCE 


173 


2. Differences* ^ 

Procedure: “What is the difference between a bird and a dog? 
Score : For example, correct responses arc; “a bird flics and a dog 
runs” or “a bird got wings and a dog got ears.” Incorrect responsesj 
“a dog chases a bird” or “a bird is white and a dog is brown 
(Many more combinations for correct and incorrect responses are 
possible.) 

^ ProSiirt : {Five pictures with a part missing) “WTiat is gone in this 

picture? or "What part is gone?" 

Lre; Missing part must be named or described verbally. Credit 
is not received for pointing. 

>-c>. cubes, "Give me three biochs, Pu, 

S^Rvedifferent number eombioarions of Mocks are requested. 

fo°above would be "gtass” or "glass and wood. 

6. Maze 1'”C‘"S points marked) "This 

Procidurt: (Mazes i schoolhouse. The little boy 

little boy lives here, ^ getting off the side- 

wants to go to school tl^eshortes^a^^ 

;Syourpetl^ofron-.gooff.^^^^^^^^^^^ 

the little boy to cch°^ 3 „d it marking is more 

srrS:;':^-tJ:i-or.hepa.h. 


VEAB ELEVEN 

1. Memory foe designs* j,„„q„gs on it. I am going to show 

Procidurt : “This ca'^ ^ , „ni| oke the card away and Ic 

them to you for ten ,„.e seen. Be sure ro look a. 

von draw from ■"'"“p,,” j, shown for ten seconp.) 

Lth drawings carcMV ( ssented in full and half crcdi . 
Score: Degrees of accuracy rep 

2. Verbal Absurdities* sPor each the question. 

"Xrt U foofei™™^^ am read and quality of insight 

o Three "foolish” s'a'“"f“’'V”|uated. For esamplc, 
r .hl'bLrdili^ is :-o he hanged, and . 



INDIVIDUAL TESTS OF 


intelligence and personality 


3. Abstract Words* 

Procedure: “What is connection?” 

Score: Five words given and response evaluated on meaning, e.g., 
some correct responses for connection are “connect two t mgs 
together,” “you’re kin to ’em.” 

4. Memory for Sentences 

Procedure: “Now listen, and be sure to say exactly "’hat 1 say. 
Score: E reads aloud two statements (separately) and child is to 
give back statement verbatim. In order to receive credit statements 
must be exactly correct with no omissions or additions or c ange 
in order of words. 

5. Problem Situation „ 

Procedure: “Listen, and see if you can understand what I tcz ■ 
Score: Statement is read and child is asked question about it. 

6- Similarities* 

Procedure: “In what way are ...» ...» and ...» alike. 

Score: Three things are presented and a response that reveas 
understanding whether basic or superficial is scored correct. 

SUPERIOR ADULT LEVEL THREE 

1. Vocabulary* 

Procedure : “I want to find out how many words you know. Listen, 
and when I say a word, you tell me what it means. What is an 
orange?” (begins at six-year level) 

2. Proverbs 

Procedure : “Here is a proverb, and you are supposed to tell what 
it means. For example, this proverb, ‘Large oaks from little acorns 
grow,’ means that great things may have small beginnings, ^^'hat 
does this one mean?” (Three proverbs are then given.) 

Score : List of possible interpretations that are correct and incorrect 
are given. 

3. Opposite Analogies* 

Procedure: “A rabbit is timid; a lion is ...” 

Score: List of correct and incorrect responses. Three analogies 
are given. 

4. Orientation 

^ Procedure: The person is given a card on which two problems 

j concerning direaions are given. He is not allowed to use pencil 

] and paper. 

' Score : Correct responses. 

5. Reasoning* 

^ocedure: Person is presented with card and reads problem while 
c aloud. Pencil and paper are not allowed. 

Time limit and possible correct answers through differen 
mathematical methods (2) are given. 



INDIVIDUAI, TESl^ OF INTELUGEKCE 
1 6. Repeating Thought of Passage* 

j Procedure: “I am going to read a short paragraph. When I am 

I through you are to repeat as much of it as you ran, You don’t 

need to remember the exact words, but listen carefully so that you 
j can tell me everything it says.” 

1 Score : Accurate reproduction of component ideas. 

The preceding illustrations are representative of some of the tasks in the 
Stanford-Binet. You can see that sometimes the tasks at different age levels 
are completely different, whereas in other cases they are the same. (The 
asterisk denotes usage at more than one level.) Many of the tests at the lower 
age levels deal with objects and pictures, whereas at the upper levels the tests 
are more' abstract and verbal. Such abilities as judgment, interpretation, 
memory, past achievement, and abstract reasoning are evaluated. 

The examiner, in administering the test, begins at a level where the child 
with some effort is likely to succeed. Remember that the tasks at a given age 
level reflect the average child’s ability at that age. If the child is unable to 
pass the tasks at the level first tried, the examiner tvill go back to an easier 
level. If the child is successful at the initial level, the examiner will continue, 
level by level, until he fails all tests at a specific level. Once this level has 
been established, the examiner credits the child with the basal age which is 
the highest age level at which all of the tests are passed. For example, if ell 
tests up to and including the fifth year are passed, and one test for the sixth 
year is not passed, the basal age is five years. The examiner also credits the 
child with tasks passed at more advanc^ levels. Thus, if there arc six tests 
at each year^age level, a child passing a single test obtains credit for two 
months of mental age. For example, Ted S. passed all tasks at the five-)-car 
level, three of six tasks at the six-year level, one of six tasks at the seven-year 
level, and failed all tasks at the eight-year level. Thus, the following computa- 
tion to derive his mental age would be made: 

1. Passed all tasks at five-year level = five years basal age. 

2. Passed three of six tasks at six-year level = six months credit. 

3. Passed one of six tasks at seven-year level = two months credit. 

4. Failed all tasks at eight-year level « 0. 

Mental age = five years, eight months. 

Ted's mental age describes the level at which he is performing. This, of 
course, does not take into account his life age. Ted's performance m relation 
to children of his own age is then expressed as an IQ. An IQ has the same 
mraning ut onu agu aa at any uibcr, Ib order lo Bod Ud s IQ, the examiner 
would consult a table to convert the mental age to IQ. 

Classifying Sinet IQ's 

The distribution of the 1937 standardization sample, aa illurlraled in 
Table 20 is still used in the I960 reeision asa frameofreferenee. It presents 


176 INDIVIDUAI, TESTS OF INTELLIGENCE AND PERSONALmT 

a basis for statistical classification of IQ’s. As Terman and Merrill {I960) 
State, 


The classificatory terms used cxny no implications of diagnostic 
significance for IQ categories. “Average or normal” has statistical 
meaning as designating the middle range of IQ’s. So, too, IQ’s 60 
and below indicate “mental deficiency" with respect to average 
mentality on the scale and cany no necessary diagnostic implications 
such as are usually attached to the term “feeblemindedness.” “Very 
superior” is applied to subjects whose IQ’s fall well within the top 
1.5 per cent of the group. . . . The table serves as a “frame of 
reference” to indicate how high or low any specific score is in relation 
to the general population [pp. 17 and 19]. 


Table 20 Distribution of the 1937 Standardiration Group* 


IQ Per Cent Classification 


160-169 

0.03') 

150-159 

0.2 } 

140-149 

1.1 J 

130-139 

3.1 \ 

120-129 

Z2 J 

110-119 

18.1 

100-109 

23J \ 

90-99 

23.0 J 

80-S9 

143 

70-79 

5.6 

60-69 

2.0 *4 

50-59 

0.4 

40-49 

0.2 

30-39 

0.03 


Very superior 

Superior 
High average 
Normal or average 

Low average 
Borderline defective 

Mentally defective 


•From Lewis M. Terman and Maud A. Mcirill. Stanford-Binet 
huUige^t Seale: Manual for the Third Revision Form L-M. 
Copyright © 1960, by Houston Company. Reproduced 

with the permission of Houghton Mifllm Company. 


Wechsicr Scales 
History 

cchsler s first test of intelligence, the Wechsler-Bellcvuc Scale, was 
dc%'eIoped primarily for adults. In constructing the first scale Wcchsler 



INDIVIDUAL TESTS OF INTELLIGENCE I 77 

analyzed various standardized tests of inteHigence that ^verc already being 
used. He evaluated each test’s claim to validity on the basis of correlations 
with published tests and empirical ratings of intelligence. The ratings 
included teacher's estimates, ratings fay army officers and business executives. 
In addition, Wechsler attempted to rate the tests on the basis of his own and 
other psychologists' clinical experience. Two years were devoted to experi- 
mental work in ttying out various tests on different groups with varying 
intellectual abilities (Wechsler, 1958). 

In revising the first scale* Wechsler also changed the name to the Wechsler 
Aduit Inteifigence Scale (IVAfS). The WAIS is a revision and complete 
standardization of Form I or the Wechsler-Bellevuc Intelligence Scale 
(W*B) and provides more efficient measurement of the intelligence of 
adolescents and adults between the ages of sixteen and seventy-five. Wechsler 
(1955) states, ‘‘The extension of the Wechsler-Bellevue Scales and the 
standardization of the modified instrument are represented by the new 
Wechsler Ad»U Inlelligeme Scale" (p. 2). 

The WAIS, like its predecessor the W-B, consists of eleven tests. The 
Verbal Scale contains six tests; the Performance Scale has five tests. All 
the tests in both the Verbal and Performance Scales are combined to make 
the Full Scale. The follou-ingare the tests in each scale; 


Verhl Tests 


f’erfomance Tests 


1. Information 

2. Comprehension 

3. Arithmetic 

4. Similarities 

5. Digit Span 
(3. Vocabuiary 


7. Digit Symbol 

8. Picture Completion 

9. Block Design 

10. Picture Arrangement 

11. Object Assembly 


The Wechsler Intelligence Scale for Children (WlSC) was developed by 
WcrJjsIcr in 1949 before the WAIS. It was an outgrowth of the W-B Scale 
and overlaps with it in format and items. The main differences are the 
additions of easier item and independent standardiration of ™SC 
IWechsler 1949) The standardization group included a sample of 100 boys 
and 100 girls at each age from five through fifteen years. The '\)'ISC has 
Che same Verbal and Performance and Full Scale format. There are ten 
basic and two alternative tests: 


Verbal Tests 

1. General Information 

2. General Comprehension 

3. Arithmetic 


4. Similarities 

5. Vocabulary 

6 . Digit Span (alternate) 


•The (W-B) 

Of the W-B h thiretest Corporation, publisher 

Fo- 1 

people nmv consider the W-B as obsolete.) 



178 


INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 
Performance Tests 

7. Picture Completion 10. Object Assembly 

8. Picture Arrangement 11. Coding or Mazes (alternate) 

9. Block Design 

Digit Span and Mazes (or Coding) are considered supplementary tests, to 
be added if time permits or if one of the tests has been invalidated. 

The latest Wechsler Scale published in 1967, is the Wechsler Preschool 
and Primary Scale of Intelligence (WPPSI) for children between four and 
six-and-a-half years of age. The ^VPPSI, like the ^VISC, consists of a series 
of ability tests which attempt to obtain c\'idence of various dimensions of 
the young child’s intellectual competence. The WPPSI norms reflect, more 
than any other Intelligence test, sodoeconomic and racial samplings. The 
^VPPSI contdns six verbal tests (one an alternate) and five performance 
tests. 


Verbal Tests 

1. Information 

2. Vocabulary 

3. Similarities 

4. Comprehension 

5. Arithmetic 

6. Sentences (alternate) 


Performance Tests 

7. Animal House 

8. Picture Completion 

9. Mazes 

10. Geometric Design 
IL Block Design 


Eight of the eleven tests are similar to the ^V^SC and prodde the same 
measures as the WISC. “Sentences,” “Animal House,” and “Geometric 
Design” are new tests in the Wechsler series. 


Characteristics 

T^c Wechsler approach of grouping certain items in subtests under two 
basic scales (\ erbal and Performance) isa radical departure from theTerman- 
Binel plan of grouping items according to difficulty. Binet and Terman 
organized their material in successive age levels whereas Wechsler organized 
types of tasks in the ^•arious subtests. Let us now rum our attention to the 
^•arious subtests and their content. The following items are similar to those 
found in the ^\ ISC and N\ AIS. The subject is expected to give a generalized 
and direct answer. 

Information 

How many toes do you have? (WiSC) 

How many days in a month? (WISC) 

WTicrc docs sjTup come from? (\VAIS) 

^^■ho wrote Crime and Punishment} (WAIS) 



179 


INDrVIDUAt TESTS OF INTEXXIGENCE 
Comprehension 

shouW you do «’h«i four nose bleeds? AVISO 

Why should people be honest? (Wise) ’ 

Why does an airplane have a motor? (WAIS) 

^nlhmeiic 

Seven blocks are presented in a row before the child and he is asked 
to count them with his finger (timed— WISC) 

At 5d each, what will four apples cost? (timed— WISC) 

A woman M-ith 520 spends $8.50. How much does she have left? 
(timed— WAIS) 

The price of frozen green beans is three packages for 45^. What is the 
price for nine packages? (timed— WAIS) 

Si/ni/aritfes 

For subjects under eight years of age and suspected mental defectives, 
four "analogies” such as "Water is blue but grass is ” are 

given. If two of the four items are passed the examiner continues on 
to the more difficult items. (WISC) For example, "In what way are a 
pear and apple alike?” 

WAIS items similar but more difficult. 

Diiii span (WAIS) 

"I am going to say some numheTS. Listen carefully, and when I am 
through, say them right after me.” The examiner starts tvith three 
digits and continues until nine digits or until subject misses two trials 
on a series. For example, 58264 was missed, another five digits are 
given and if subject misses again test is slopped. If correct response 
is given test continues. 

After completing this first series of digits subject is asked to repeat 
numbers backwards. For example, the examiner says "683" the 
subject should answer "386”. Test is discontinued in same manner as 
previous series. WISC uses this same Format for alternate test. 
Vocabulary 

Words such as "wagon” and "ruby” are presented to the subject on 
the Wise while words such as "spring" and "digress” are included 
in the vocabulary section of the WAIS. There are forty words on 
both the WISC and WAIS. After the subject has had five consecutive 
failures (no credit) this test is discontinued. 

Picture Completion 

"I am going to show you some pictures in which there is a part 
missing. Look at each picture and tell me uhat part is missing." 
Subject may verbalize part that is missing or point to it. 

Block Design . -j 

"You see these blocks have different colors on their different sides. 

I am going to put them together to make something tvilh them. 



180 


INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

Watch me.” Four blocks are arranged according to a design on the 
examiner’s card. Subject is required to make the same design using 
the blocks as the model. If he is successful be is sho^ra card number 2 
which has a design that he is to copy by arranging the blocks in the 
same manner. Starting wth designs that require four blocks and 
continuing (if subject does not hare three consecutive failures) up to 
nine blocks the subject is required to accurately reproduce the model 
designs on the cards presented to him. Bonus credits are awarded for 
speed in completing tasks on more difficult designs. 

Picture Arrangement 

A series of comic-like pictures are presented to the subject and he is 
asked to put them in the correct order so they ^vill tell a story'. Bonus 
credits are given for speed on completing tasks. 

Object Auembly 

“These pieces, if put together correctly, will make a girl. Go ahead and 
put them together.” The examiner presents subject with cut up 
(pu 2 zIe-Ukc pieces) parts and asks subject to put them together. Time 
bonuses for speed are given. 


1 


2 


3 


4 

0 


* 


A 











INDIVIDUAL TESTS OF INTELLIGENCE 

beneath have no marks, Yo„ - “ 

;i«L‘"d fs“ot: point while revved ^hou are given halt 

Ldit. See Fig. 13 for items similar to WAIS, 

s>™ho. " 

Sir:^.n-k^pe^«;p;p^-:j“;rv::j:^^^^ 


mat mchsltr Secret Meat. „ 

Raw scores on each f “JI'" ^ mean of 10 and a standard 

“scaled scores " or stan sr .^gather to produce three IQ a er , 

deviation at 15. (Th Ravlev (19«) f““nd, 

fte m™Wer IQ. I" "^^^1^0* vlfcTnd S-^ 

Cronbach(1960) ,es. it is hard to , o recognize 

;rtrn“" " 

stdS^^ - "rS" — 









INDIVIDUAL TESTS OF INTELLIGENCE 


183 


Table 22 Classification ofMentalDefectim According to IQ’s 
(WA1S)‘ 


Classification 

IQ Range 

Percentage 

Included 

Probable 

Error 

Moron 

69-50 

1.9 

—3 to .—5 

Imbecile 

49-30 

0.32 

-5 to -7 

Idiot 

29 and below 

0.002 

— 7 and beiow 


• From David Wechslcr. The Mtaatremeniand Appraisal of Adult 
Intelligence. (4th cd.) The Williams & Wilkins Co. © 19S8 by 
David Wechsler. Reproduced by permisuon of David Wechslcr. 


personality— and are considered along with past grades and opinions of 
school teachers. 

It should be noted that sometimes a child will have emotional problems 
so severe that they interfere wth his intellectual functioning and cause him 
to score in the range of mental retardation when in /act he may possess an 
average or even superior intelligence. . . - , .... . 

Teacher observations can also be valuable in finding these children. A 
sensitive teacher may notice that a child occasionally performs well, thus 
denoting ability, even though his usual performance is far below average. 


Other Diagnostic and Clinical Features 

The Wechslcr Scales are more than tests that produce a psychometric IQ. 
Jmnnrfanf as that mav bc they also assist in evaluating personality character- 
important as th y ^ I, L 5 found, for example, that in the 

performance section than in the considered significant 

more between Performance and \«bai lus ii. ^ 

(Wechslcr, 1958). 

Organic: S'"™ “‘“f do boltor on vorbal .bon on 

Those ivho ore , „/,he Digit Symbol ond Block 

(Digit Span). 

ScBBOPimENlAt r„fonnotion ond 

The schizophrenic usu y d Completion, or 

Vocabulory and does poorly on eiin 

. » blared proup of mental disturb- 

. Schizophreni. i> » «i.h nsdiiy. The ctarsid dwi-.™ "re 



184 INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

both. On the other hand, Wechsler states, "one finds some schizophrenic 
patients who do well on one or several of the tests which are characteristically 
failed by the typical schizophrenic” (p. 175). 

Anxiety Types 

The tests most sensitive to those suffering from anxiety, whether it be 
pathological or of less severity, are Arithmetic, Digit Span, and Digit SjTn- 
bol. It has also been noted that these people tend to have lower Perfor- 
mance Scores than Verbal Scores (Wechsler, 1958). 

OBSESSIVE-COMPULSlt’E NEUROSIS 

Those suffering from obsessive-compulsive neurosis are characterized by 
perfectionism, rumination, and rigidity. They, for example, never outgrew 
"touching all the lines on the sidewalk.” The quality of verbalizations is 
generally the most reliable sign of obsessive-compulsive trends on the 
Wechsler Scales. Schafer (1948) cites several examples of their verbalizations, 

j "There is a good deal of dispute as to who invented the airplane 
I but the Wright Brothers get credit for it.” "If I ivere lost in the forest 
1 in the daytime I might follow the sun ... or go by the moss on the 

I north side of the trees ... or maybe follow a stream. Do I have a 

i compass? If I had one I’d . . [p. 25]. 

Hysteria 

Hysteria is generally characterized by impulsiveness, egocentricity, 
tendency towards histrionics, and in severe cases conversion hysteria, such 
as paralysis of a limb with no apparent physical cause. Those with h5^teria 
generally obtain performance scores that equal or exceed the verbal level. 
On the verbal section one finds relatively good scores in Comprehension 
and a poor showing in Information. Among the performance tests Digit 
Symbol is usually well performed. 

General Neurosis 

In the area of general neurosis one may see children or adults with 
obsessive-compulsive, hysteric, and phobic reactions. These people usually 
re%'eai themselves by doing poorly on relatively easy items in Information, 
Vocabulary, Digit Span, Arithmetic, Picture Completion, and Block Design 
while passing the more difficult items in these tests. These individuals have 
much higher verbal than performance scores (Schafer, 1948). 

It should be remembered that none of these diagnostic signs are true in 
all situations. They are only rough generalizations that reveal to the skilled 
clinical or school psychologist some clues to the overall characteristics of the 
person being examined. They are, of course, used along with other tests and 
forms of evaluation, such as interriews and day-to-day behavioral records. 



185 


INDIVroUAL TESTS OF INTELLIGENCE 

The various syndromes mentioned do not represent aU the possible diag- 
nostic categories, nor are they complete in any sense. They arc intended as 
mustrations of the broad range and use of the Wechsler Scales that go beyond 
the psychometric IQ usage. 

Two illustrative cases dealing with brain damage and anxiety follow to 
show how the Wechsler is used in actual clinical practice (Wechsler, 1958).8 


W-BI 

Vocabulary 7 Case 0-2. White, male, adolescent, 

Information 8 age 14. Brought to hospital because of 

Comprehension 9 marked change in personality. Had been 

Arithmetic 7 normal boy until 6 months prior to 

Digits 6 admission. Illness first manifested by 

Similarities 9 failure at school and increased iirit- 

Verbal 39 ability. Physical and neurological exam- 

ination on admission essentially neg- 
Picture Arrangement . 6 ative. Case presented to illustrate value 

Picture Completion . . 1 of Scale in detecting possible organic 

Block Design 1 brain conditions prior to manifestation 

Object Assembly .... 2 ofneurological symptoms. Psychometric 

Digit Sytnbo} 5 organic signs are: Verba! much higher 

Performance 15 than Performance; very low scores on 

dol/i Object Assembly and Block De- 

Verbal IQ 91 sign; large discrepancy between Digits 

Performance IQ 50 forward and Digits backward. On the 

qualitative side, subject showed com- 

Digits forward 6 mon organic manifestation of being 

Digits backward ... 3 able to reproduce designs if presented 

ivjth a model of assembled blocks (200), 
after failing completely with the usual 
form of presentation. 


irA/s 

Information 

Comprehension 

Arithmetic 

Similarities 

Digit Span 

Vocabulary 

Digit Symbol 

Picture Completion . . 


12 Carf An. 4. Ib-year-oId white male 
II student who was admitted to a psychiat- 
10 ric out-patient clinic; revealed the 
15 typical adolescent problems; tension 

10 with his family, particularly in his 

14 relationships with his mother and sib- 
ling. difficulty in school, and struggling 

11 to find a value system, perhaps a sense 

15 of identity. His difficulties in school 


©, mi D. Usid by <>f 



INDIVIDIIAL TESTS OF INTELLIGENCE AND PERSONALITY 


Block Design 11 forced him to leave school shortly 

Picture Arrangement . 14 before his admission to the out-patient 

Object Assembly .... 11 clinic. 

On admission patient gave a two year 

Verbal IQ 118 history of epigastric pain with an ulcer 

Performance IQ 116 demonstrated radiologically at different 

Full IQ 121 stages of healing or activity. The present 


episode began three weeks prior to 
admission when an active ulcer was 
demonstrated by x-ray and was treated 
medically. He was referred to a psycho- 
analyst who saw him 3 times prior to 
admission. He was admitted in a state 
of acute anxiety. Diagnosis: anxiety 
state. 

Psychometrically, this patient did not 
show too large inter-test variability but 
it is significant that the lowered scores 
on the Verbal part of the examination 
were on Arithmetic and Digit Span. 
The Digit S>Tnbol was not so much out 
of line on the Performance part of the 
j examination but was still one of the 

I lower scores. In this case we seem to be 

i dealing v.ith an individual with chronic 

1 anxiety and a great deal of aggression 

I directed inward. This would be better 

indicated by his projective technique 
tests. Tlie high scores on the Similarities 
in contrast with the low score on Com- 
prehension also suggests a possible 
schizoid trend. This case suggests a 
j’ much more complicated diagnosis than 

I the one assigned it clinically; it has 

fl been added to illustrate the presence of 

■\ anxiety (along with other symptoms) 

li re\-ealed by the psychometric pattern. 


Diagnostic Cautions 

The student of measurement must be very judicious in accepting the 
preading data as definitive. It is far from that and, as we have stated, diag- 
nosis must be made in conjunction with other sources of information. Many 
psjchologists do not feel the Wcchslcr scales are verj’ authorative in clinical 
assessment. The)* would use it only as a “rough gauge” of behavioral 



INDIVIDUAL TESTS OF INTELUGENCE 

dysfunction. Cronbach (1960) sates the case quite well when he s^ys that 
t^ "Weehsler yields a general measure of mental abihty arid a verbal^er 
SLnee d« and beyond that can offer hints lead.ng to further 
Study of the individual” {p. 202). 


Evaluation of the Binet and Weehsler 

The Stanford-Binet and the bSr^^^^^ 

They ate both sundard equipment ^ ^ 

Though he may augment ..Among general 

rut^sepXSeWeehslerLthe^^^^^^ 

behavior and provides a S'"S' PJ , .syohomcttic vintage, mellowed by 
general level of intellectual jau ,o which the examiner can 

years of numerous especially at the lower age levels, 

Ln for interpretive ‘^'“"fl'A^Sller ^esehool and Primary Seale of 
and until the recent advent or uw 

TnteltifFence had no competitive peere. educational abilities. 

The Binet 

than the g neurological <!«»*;?• administer the tmts 

diagnosis of administration, and the -wiSC after 

as highly trained 1 psychologists prefer to a because 

;K'Sr,"..Sis5S» 

pracriee and research 

seen how the WPPb ^Veehsler and 

investigations. ^ (1969) m reviewing 

Thorndike and Hagen V 

Binet state, now in preferring 

to academic success pe> 



188 


INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

Binet’s. ... for children from 7 to 15, a decision bet^s'een the 
two tests is not an easy one. The Binet is reported to be somewhat 
more difficult and time-consuming to give. The usual Binet procedure 
of carrj'ing the examinee through to the point where he encounters a 
long series of failures is judged to be a seriously upsetting matter for 
some emotionally tense children. . . . The ultimate basis for choice 
will be the validitj’ of the inferences that can be made from each in 
the situations in which thej’ are actually used. Prediction of academic 
sucres can apparently be made about equally well from either test. 
It seems likely that the two tests are about equally useful for children 
li with mental ages of 7 or above [p. 305]. 

Anastasi (1968) feels that the Wechsler Scales ". . . . yield an IQ wth 
high reliabilit)’ and fair evidence of validity” (p. 301). Cronbach (1960) feels 
that the Weedier Scales are deficient in only one area, that is, the range is 
not large enough to measure very high and very low abilities with accuraty. 
Downie (1967) makes a related observation when he says that “Brighter 
subjects score higher on the Binet, and duller ones higher on the Wechsler” 
(P- 277). 

It is obvious from our discussion that the Stanford-Binet and the Wechsler 
Scales are the best individual measures of intelligence presently available. 
Thej* are time-consuming and expensive, howet’er, in that only one person 
at a time may be examined and only highly trained individuals such as 
clirucal or school psychologists may properly administer them. (See Merrill’s 
“Training Students to Administer the Stanford-Binet Intelligence Scale”.) 

In our schools today both the Wechsler and Binet are given only when 
other tests and information seem unreliable or in special cases which require 
intensive and thorough information, as for placement of a child in a special 
class for retarded, emotionally disturbed, or perceptually handicapped 
joungsters- ^^^len used for these special purposes and administered and 
interpreted by a trained j^’chologist, they are extremely useful tools in 
helping prervide indiridualized instruction. Teachers and school adminis- 
trators should be certain, howe\'cr, that the Binet or Wechsler results that 
th^ consider are based on the report of a trained clinical or school psj'chol- 
ogist. (This means more than a course or two in indiridual testing.) is 
impo^nt not only because of the highly complex nature of the tests, but 
also becarisc of the clinical skills in observation needed to cs'aluate the 
psj’chomctric scores. 


Nonlanguage Tests’ 

MoU intdligmce tests contain, to some degree, tasks that are verbally 
orient . IS IS not surprising because the vast majority of learning and 


*pec^c distinction made between “nonlanguage” and “perfor- 
mance testa in tha aecuon. Nonlanguagc wiU suffice for botK 



INDIVIDUAL TESTS OF INTELLIGENCE 

11 'di.d 

speak a foreign language or co j.fipiendes in hearing and speech, 

well as with those who have ,^ 5 , ^ ponw I of the Arthur 

A good example °f ™"' ” ,,^ 3 j ^Tje^ arc nine tests itr the Arthur 

Performance Scale (Arthur, lyja, ; 

Scale. 

1. Koo«C„.eisa..tofi.,....e.^^';J~ 

2 . 5e,rc‘Vor/S»^^^^ 

rp;:^rll?S=? .uhieet is told .0 puce them 

in" these holes as tat « h' (armboard than the Seguin, 

3 Tvio-Fkure Form Board is square and cross. 

Cuiup pieces are “ ^“li T«-fI.'C Form Doard but more 

arms, legs, 

^-'rrm'’:tc,“"'^ Lwc.--.ipa test which ha, eu.-ouu 

^^"''Jlmmhrasj^mbled or fitm^;-^^^^^^ , „ ^1, 

8 Porleiir Moron « basicat^ p„h from me 

’•"”^riS"m«ar.ochildrenUbloehs^^^^^^^^^^^^ 

11 of the Arthur initruction* for 

The Revised Form added fc**“;'* & Board. Porteu^ 

13 an alternate to Fof*" , Knox Cube. ^ ^ Arthur Sten 

deaf children, 'f ‘™;=,fcomrle<i°n- ^ 

Mares, and IlealJ Piem" ^ 



190 


INDIVroUAL TESTS OF INTELUGENCE AND PEHSONAEITY 

Design Test I was also added. This test calls for the subject to reproduce 
designs that are presented on cards by superimposing cut-out stencils upon 
a solid card. 

The subject is given points for his performance on each subtest of the 
Arthur Scale. These points are summed and the total score is converted to 
a mental age equivalent. An IQ may be obtained by the classical formula of 
MA/CA. 

Another nonlanguage test of special importance to teachers is the 
Goodenough Draw-A-Man Test (Goodenough, 1926). For many yca« 
teachers have used this test in primary grades as an estimation of intelli- 
gence and readiness for reading. In this test each child is provided a pencil- 
and special sheet of paper and given the following instructions: 

I On these papers I want you to make a picture of a man. Make the 
very best picture that you can. Take your time and work very care- 
fully. I want to see whether the boys and girls in school can do 

as well as those in other schools. Try very hard and see what good 
pictures you can make [p. 85]. 

Figure 14 shows drawings of kindergarten children taken from Good- 
enough’s (1926) chapter on scoring samples. Though there is an elaborate 
scoring system the experienced scorer can grade "forty to fifty papers an 
hour, although in the beginning he may not have been able to score more than 
five or ten an hour” (p. 87). 

The important thing to remember is that the Draw-A-Man Test is not 
scored on esthetic or technical qualities but on the completeness and develop- 
mental maturity of the drawing. For example, a clear indication of the neck 
as separate from the head and body is scored a plus, not a “mere juxta- 
position.” Or in scoring for the head Goodenough (1926) allows credit for 
"any clear method of representing the head. Features alone . . . without 
any outline for the head itself, are not credited for this point” (p. 91). 

There are many more nonlanguage tests available for school use. A good 
many of these may be given in groups as well as individually. Among some 
of the more noteworthy are the Pintner Non-Language Test, In which all 
instmetions may be given by pantomime, and the Nebraska Test of Learning 
Aptitude, which was standardized on deaf and near-deaf children (individual 
test). 


Culture-Fair Tests 


^ In Chapter 2 we discussed the issues surrounding the problem of cultural 
bias in tests. This section will not deal with the controversial aspects of 
cultural test bias but only with a discussion of the efforts to develop culturally 
fair instruments. Not too many years ago the term used was culture-free, 
howe\'er this designation was soon dropped when measurement people. 







192 INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

after reviewing years of experimentation and clinical data, realized that no 
test is “culture-free.” The aim today is not to produce an instrument free 
of culture but an instrument relatively fair to different cultures. For example, 
the American test that assumes a basic American background in football 
and baseball and uses items that reflect this exposure would not necessarily 
be fair for an English or German student and certainly would not be appro- 
priate for an African village boy or girl. 

Culture-fair tests are similar to nonlanguage tests in that they are almost 
always nonverbal. They must, of course, be more than nonverbal; they must 
also be free of cultural bias. The Army Beta (see Chapter 2) is a nonlanguage 
examination; however the Picture-Completion Test in it has items such as 
a violin, gun, and pocketknife which are cultural products. 

One of the first attempts to develop a culture-free test was the Cattell 
Culture Free Intelligence Test. This test is based on the underlying assump- 
tion that general intelligence is a behavioral act of seeing relationships in the 
things with which we have to deal and that this aptitude to view relationships 
may be tested with simple diagrammatic or pictorial material. In addition, 
if the test is to be used in different cultures, the pictures or forms of objects 
should be universal and not confined to any cultural group. There is no 
evidence that, in fact, the test is useful for different cultures (Thorndike 
and Hagen, 1969). 

Another “culture-free” test, developed in Great Britain by Raven (1938, 
1947, 1962), is the Progressive Matrices. It is a nonverbal test series requiring 
the subject to solve problems presented in abstract figures and designs. The 
1938 form consists of sixty matrices (designs) from each of which a part has 
been removed. The subject is required to choose the missing insert from six 
to eight alternatives. The 1947 form contains the two most elementary sets 
from the 1938 edition plus additional items. The 1962 form is a thlrty-six- 
problcm scries designed for people with aboi e average intelligence. All forms 
have norms obtained from an English population. The Progressive Matrices, 
though useful in a clinical setting, have not shown any substantial research 
data in terms of validity or reliability. 

A great deal of research has taken place since Cattell first introduced the 
Culture Free Test. Since then numerous tests purporting to be culturally 
free or culturally fair have been constructed and tested with little positive 
results. (See, for example, Fowler, 1955, and Coleman and Ward, 1955.) 
Their predictive validity has not been as good as other “culturally biased” 
instruments, yet they do not eliminate this “bias.” The research continues 
to demonstrate that lower-class children perform generally as poorly on 
culture-free instruments as on other less culturally controlled derices. 

Wesman (1968) feels that the search for the culture-free or culture-fair 
is "sheer nonsense.” He states, 


i 


The implicit intent in the attempt to create culture-free or culture- 
fair tests is somehow to measure intelligence without permitting the 



193 


INDIVIDUAL TESTS OF INTELLIGENCE 


VIDUAL Thbis ur 

effects of differential exposure to learning to inducnre scores. This 

ingenious, but ingenuous [p. 269]. 

Infant and Preschool Tests 

about everj’body being eq , ^ interest in the ' smart- 

paS" bab, or preschooler [Ilg and Antes, .9a5, 

p. 189]. 


Gesell (1940) was one of the te“™s°es. Hi. first test. 

“"Ssrr..7S-”Ki;K 

lugar pellet, and a pencil ?"■< Wf J,e sees it on a table top or other 
oecurs at about four months of “Sei I ,eaction is nsual onij. 

pS of furniture propped befotejum. This , hi. 

a child are the follomng ■ ^ ,1,/child turn to look at a ''Sb , 

he uulk and if so « !„ pick up a block, spoon, or tins pellet 

mcnul stages of th ..-ormal” sample v.crc m Amcrican-bom 

constituting a genera ^ problems. They -.ijercd of middle- 

judged healthy w»th extraction and were infants 

iarints of Northern E“opea^^„mic and edueatiooaU.el.^l 

class status in terms and miJc larer at eshicen 

were examined at the pf^gc. FotIo«TJI« period rc- 

intervaU until r, fi«. “*”* nV?rtcU DVclopmcntal Sch^ul" 

months, at two. “d ^nutruda.l93S:Gc^^ 

c.\aminations . dau obuined (Ge*c • . 

rereconsnue.edf-.H.i^ 


949). (See Figure '^' 5 . 6 ) outlm^ i„ 

Ciell and Amatruda (1941. j„d„pmen>. 

mining the child's level 


brief 


summary: 



Figure 15. Gcscll Developmental Schedules materials. (Reproduced by permission 
of The Psychological Corporation.) 

1. Motor behavior: Postural reactions, head balance, sitting, standing, 
creeping, walking, manipulation of objects and so forth. 

2. Adaptive behavior: Eye-hand coordination in grasping, reading, 
and manipulation of objects. 

3. Language behavior: Facial expression, gestures, postural move- 
ments, and speech. Also comprehension of other people’s speech is 
noted. 

4. PersonaUsocial behavior: Feeding, toilet training, play, smiling, 
and so forth. 

The Gcsell Developmental Schedules arc for the most part observ'atlonal 
"tests.” Approximate developmental levels in terms of months in each of 
the four areas arc scored. 

Other infant tests such as the Cattell Infant Intelligence Scale, California 
First-Year Mental Scale, and the Kuhlmann-Binet are more in the classic 
tradition of standardized tests, although they all have items in common 
with the G«cll and Binet scales. 

Validity and reliability data on infant tests are generally quite low. Bayley 
(1949) studied the relationship of test results in the first year of life with later 
scores using the California First-Year Mental Scale, California Preschool 





195 


INDIVIDUAL TESTS OF INTELLIGENCE 
Scale and the Stanford-Bmet. The resalta tvere verj’ disconc^ing For 
example, scores obtained bj infants tested at ten, eleven, 
correlated with scores obtained at five, six, and seven of age, at 0.20. 

It is this writer’s opinion that infant tests ate too crude today to put much 

on any of the infant tests is to court p bcst-knowTi tests 

The m age isTom two to four. There 

designed for preschool ,y.thtee scotable test areas, 

are thirty-eight tests which y “standing on one 

Only four call for verba -^2^wlh scissors.’’ 

foot,” "building a W"* (G„„denough and Maurer, W42) is an- 

The Minnesota Preschool Scale ( contam.ng 

other widely k"""'" sirnilar to the Binet. Some of the 

twenty-six tests, with a format very 
tests are the following: 

• 'Tk* ^avaminer draws a vertical stroke 
ani Aela aot^Th/^^ “thick examiner taps in 

U .■ Cclcr,. Child is asked .0 name the colot of cards that ate 
red, blue, pink, takes paper and folds i. in three 

The ntost outstanding 

"'TreWechsler Preschool an^^^^ 


an instrument tests that are cattcu 

to Wood and Deal {1 ; preschoolers a 

quently tor the , California f''?;;) “ f ji„tal Tests, and the 

Infant Intelligence . ,„,ni-Palmet Scale of Wen 

D-efop-ntalSce" 


more detailed analysis o 
Buros (1965) 



196 INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

References 


Anasfasi, A. Psychological iesting. (3rd ed.) New York; Macmillan, 1968. 

Arthur, G. A point scale of performance tests. Vol. II. The process of standardization. 
New York: Commonwealth Fund, 1933. 

Arthur, G. A point scale of performance tests. Vol. I. (2nd cd.) Chicago: Stoelting, 
1943. 

Arthur, G. A point scale of performance tests. Revised Form 11. Manual for administer- 
ing and scoring the tests. New York: The Psychological Corporation, 1947. 
Bayley, N. Consistency and variability in the growth of intelligence from birth to 
eighteen years, foumal of Genetic Psychology, 1949, 75, 165-96. 

Buros, O. K. (Ed.) The sixth mental measurements yearbook. Highland Park, N.J.: 
Grj’phon Press, 1965. 

Chauncey, H., and Dobbin, J. E. Testing has a history. In C. I. Chase and H. G. 
Ludlow (Eds.), Readings in educational and psychological measurement. Boston: 
Houghton Mifflin, 1966, Pp. 3-17. 

Coleman, \V., and Ward, A. Comparison of Davis-EelU and Kuhlmann-Finch 
scares of children from high and low socioeconomic Journal of Educational 

Psychology, 1955, 46, 465-69. 

Cronbach, L. J. Essentials of psychological testing. (2nd ed.) New York; Harper & 
Row. 1960. 

Downie, N. M. Fundamentals of measurement: Techniques and practices. (2nd ed.) 

New York: Oxford University Press, 1967. 

Fowler, W. L. A comparative analysis of pupil performance on conventional and 
cullure-eonlroUed mental tests. Unpublished doctoral dissertation, University of 
Michigan, 1955. 

Gcscll, A., and Amatruda, C. The psychology of early growth. New York: Macmillan, 
1938. 

Gesell, A., and Amatruda, C. Developmental diagnosis : Normal and abnormal child 
development (2nd ed.) New York: Hoeber, 1947. 

Gcscll, A. ct al. The first five years of life: a guide to the study of the preschoolchild. 
New York: Harper & Row, 1940. 

Gesel), A. ct al. Gesell Developmental Schedules. New York; The Psychological 
Corporation, 1949. 

Goodenough, F. L. Measurement of intelligence by drawings. New York: Harcourt, 
Brace and World, 1926. 

Goodenough, F., and Maurer, K. The mental grerath of children from tsco to fourteen 
years : A study of the predictive value of the Minnesota Preschool Scales. Minne- 
apolis: University of Slinncsota Press, 1942. 

Ilg, F. L., and Ames, L. B. Child behavior. New York: Dell, 1955, 
llg, F. L., and Ames, U, B. School readiness : Behavior tests used at the Gesell 
institute. New York: Harper & Row, 1965. 

McNcmar, Q. Lost: Our intelligence. Why? In C. I. Chase and H. G. Ludlow 
(Eds.), Readings in educational and psychological measuresnent. Boston; Houghton 
Mifflin, 1966, Pp, 180-97, 

Merrill, M, A. Training students to administer the Stanford-Binct InteUigcnce 
Scale. Testing today. Houghton Mifflin, (thrcc-pagc letter to the Editor, distribu- 
ted by Houghton Mifflin, no date). 



INDIVIDUAL TESTS OF INTELLIGENCE 


197 

Raven, J. Progressive matrices, Forms. 1938. 1947, 1962. New York: The Psycho- 
logical Corporation. 

Schafer, R, The clinical applicalion ^ psyduAo^cel tests. New York; Internationa! 
Universities Press, Inc., 194S. 

Stutsman, R. Mental measnremeni of preschool children. New York: World Book 
Co., 1931. 

Terman, L. M. The measurement of intelUgence. Boston: Houghton Mifllin, 1916. 
Terman, L. ]VI., and Merrill, M. A. Measuring intelligence. Boston; Houghton 
Mifflin. 1937. 

Terman, L. M., and Merrill, M. A. Stanford-Binet intelligence Scale: Manual for 
the third r«;in‘oH,/orm L-M. Boston: Houghton Mifflin, 1960. 

Thorndike, R. L., and Hagen, E. A/emwrement and ezaluation in psychology and 
education. (3rd ed.) New York; Wiley, 1969. 

Wechsler, D. IVechsler Intelligence Scale for Children (manual). New York; The 
Psychological Corporation, 1949. 

Wechsler, D. Manual for the Wechsler Adult Intelligence Scale. New York: The 
Psychological Corporation, 1955. 

Wechsler, D. The nieasurefnenl and appraisal of adult intelligence. (4th cd.) Baltimore : 
Williams and Wlkins, 1958. 

Wechsler. D. Manualfor the Wechsler Preschool and Primary Scale of Intelligme. 

New York; The Psychologial Corporation, 1967. 

Wesman, A- G. Intelligent testing. Amencan Psychologist, 1968. 23, 267-74. 
Wood, P. L., and Deal, T. N. Testing the early educational and psjchological 
development of children-ages 3-6. Rerieso of Educational Jlesearck, 1968, 

38, 12-18. 



CHAPTER 


Projective Techniques 


Projective techniques arc not tests in the true sense, because there are no 
tight or wrong answers. One cannot obtain a perfect score or fail. Certainly 
there are “right” answers in the sense that a group of certain hinds of 
responses reveal to the trained examiner personality traits which may be 
■pa\hD\ogica\. There are, htwever, rto defimVwe raw scores that point to 
mental illness or the absence of disturbance. 

Projective techniques are used in personality testing in order to explore 
the individual’s world of make-believe. To accomplish this, material that «s 
indefinite and vague is presented to the person who then is to respond in his 
own unique manner. 

Frank (1939) introduced the term projective techniques long after the actual 
instruments had been in use. He thought of them as similar to the X-ray 
methods of medicine. Frank and others have delineated the projective 
technique from other personality instruments by focusing on the ambiguity 
or unstructuredness of the stimulus material, thereby allowing the subject 
almost complete freedom of response. With an objective personality test 
(see Chapter 12), on the other hand, the subject responds to a limited number 
of predetermined responses. 

Another distinctive cliaracteristic of projective techniques is their indirect- 
ntis; that is, the subject being examined is not completely aware of how his 


1Q9 

PBOJECnVE TECHNIQUES 

responses ore going to be evalustea and is therefore less inclined to fake or 
rSisraTstvering. For example, if someone asked yon, “Do yon love yonr 

r.d"e- » stsir 

projecTivc techniques and objective per«,nahty tcstrng. He states, 

...rreareforcedtoconcludethattheMentUyW 

as a unique class *^‘^*^'^“‘^f'j^pons7offered the examinee or the 
either the amount of freedo At best, tests might be 

= 5 - 

j quite unlihely [p. 201]. 

The reader of this hook n«d 

s:tvThafa‘’;?"— 

gn L responsei through '■"^J^rTha, is, 

Lathing can be called a ’h„|ogist who reports h.stad.np 

made from a sample of brta''">';7,'’'XCn what the snbjecfs IQ ts. H 
from Intelligence testing has mote t y j p person during the 
lrr:cs thf quality of the same or ^milar manner 

exLX'did';?^"^^ 

with each A,, „ project himsc f into ■ • J ,, 

mote opportunity he has plot, a picture, or a wo 

advantage of this A. "grperson's feelinp oA*°A,id js the favorite 

a means of ^’'^'■"5/^ vHthin a eliniml semn^ 

The projective dov“ and to V"”" * rSeorftieal base the 

personality test of the cun s ,v,th 

psychologist. Most pt ) „ behavior. P®?' Projective techniques 

psychoanalytic system of h techniques. 

Other orientations also assumptions 

• riQ6l) discuss m gr jnterested in pursuing 
1 Ainsword. (.«>) ” 



200 INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

are not only used in aseertaining personality characteristics but also in 

“Toul^ehotpr^P^^^^^^^ 

and1nt°enSs"“havr TprevSr enlV The school 

psychologist is 'usually the only school person trained to 
technique and he us« it with only a small number of students who prese 
serious emotional and/or intellectual problems. Projective 'f^'”^d“miXter 
given to all students. In addition, when the school psychologist does admimst 
a projective device in almost all cases he obtains parental consent 

The number and various types of projective techniques « " 

review, in depth, of them all is beyond the scope of this book. We w 1 , 
therefore, focus our attention on a few that you as a teacher or c 
may encounter in the schools. (For a thorough and extensive survey ano 
evaluation of projective techniques see the Mental Measurements ea 
books. These volumes have a separate section devoted to projective tec - 

In the remainder of this chapter we will focus our primary attention on 
the Rorschach, Thematic Apperception Test, and Draw-A-Person. 


Rorschach 


Ofcrvicw 

The UoTJchach is one of the basic diagtiostic tools of most psychologists. 
This test is sometimes referred to as the ink -blot test. It consists of ten Mr s, 
each having a different ink blot; five arc printed in black and white and five 
in color. 

The psychologist shows one card at a time to a child (or adult), asking him 
to tell what the ink blot makes him think of and what it may mean to him. 
After the initial instrurtions, the psychologist does not directly help or 
instruct the subject except to show him the cards. After the ten cards have 
been given they arc presented a second time. In this phase the psychologist 
attempts to find out what in the ink blot made the person answer as he did. 
For example, the psychologist may state, “What in card 1 made you think 
of a bat?” In this manner the psychologist gains insight as to where in the 
ink blot the person saw the “bat” as well as what in the person’s background 
made him think of a bat. 

'rhroughout the Rorschach examination, the psychologist records m 
detail the person's responses. After the test is completed, an analysis of the 
record is made. 'Flic scoring and interpretation of the Rorschach record is a 
long and complicated task, and the psychologist needs a great deal of training 
and experience to do it competently. Thus the administration and scoring 
of a Rorschach should not be undertaken by a teacher, counselor, or even 



PROJECTIVE TECHNIQUES 

some psychologists tvho have not undergone special training. The Rorschach 
should Ly be® administered by trained clinical or school psychologists and 
only to students who are in need of it 

History 

The earliest use of ink blots that »elmo«of,va^e^ 

Klehog,aphie«, published m Germany rn 1857 J possibilities in 

recounts .bat he accidentally “med to'take on as he 

ink blots by noting the bizarre gu e that it seemed almost 

obser^^ed them. Kemer’s He had 

impossible to reproduce mk blots J of the ink blot per 

experienced the interplay between Kcrner did not realize the 

;”ffieInceofh.“^^^^^^ 

one TSS 

of the ink blots to persooa^ity diagoos . „„ 

Binet and other psychologists “u Mot,. The ink blots ^re 

the cootent of the subjects (Klopfer and Kelley. 1946 ■ 

used as stimulus "’“'"'9'. Herman Rorschach, a Sniss psychiatrist, 

than just tvhat was seen. , after the \ 

publications of the techn q Under Levy', Samuel B United 

Lh as David Levy, m Rorschach technique m the 

doctoral dissertation, m > influence and 

acceptance among characteristics, 

evaluation of personality chara 

Test Administration j „,biect is eirtreroely important. 

The relationship he^eenrh^—-^ 

Str^'%;LfdS^Sirto.herhand,^^^^^^ 

noted and is used m completing 



202 


INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

el al (1954) state, “Although it is desirable to have ‘good’ rapport, it is 
unnecessary to assume that this is essential in order for the test report to 
be valid. It seems more important that the relationship, whatc\'cr it is, 
should be clearly perceived and understood by the examiner” (pp. 3^). 

The actual examination consists of three basic phases. Let us briefly 
review these in the context of an actual examination using only one of the 
ten cards as an example. The reader should note that in the examination 
each phase is continued until the completion of the ten cards and then the 
next phase is initiated. 

Performance Phase — Card I 

Examiner: 1 have some ink blots to show you and I want you to tell 
me what they make you think of, what you see, and what it 
might mean for you. Remember, this is not a test in the 
usual sense of failing or passing. You can’t get a “hundred” 
or a “zero” as you can on a test in the classroom. People 
see all kinds of things in these ink blots. Now tell me what 
what you see and what they make you think of.® 

Subject: It looks like a bat with large wings. It’s . . . ah . . . it’s 
flying. 

Examiner: Anything else; remember, tell me what you see and what 
it makes you think of.® 

Subject: Just a bat flying — nothing else. (Subject ts given the next 
card and responds; this procedure is foUozeed until ten cards 
hate been administered.) 

Inquiry Phase— Card I 

Examiner: You did very well. Now we are going to go back and look 
at the cards again and I want you to tell me tchaf in the 
cards made you think of what you said. In card 1 you saw a 
bat flying; now tell me what in the card made you think 
of a bat. 

Subject: It just looks like a bat. 

Examiner: It just looks like a bat? 

1 Subject: Yes, it looks like a bat; it has wings, a body, and face 

1 and its shape, just looks like a bat to me. (Procedure is 

j foUozeed until ten cards are completed.) 

* It should be noted that the language is, of course, varied according to the subject 
and situation, although the basic instructions remain the same. 

* After this prompting the examiner wiU not say anything else throughout this 
phase of the examination. 



rROJECnVE TEafNIQUES 

2o: 

I ® Testing of the Limits— Card I 

Exammer; You said you saw a bat in card one? 

Subject: Yes, because of its shape. 

Examiner: Any other reason? 

Subject: No, just its shape. 

Examiner: Wiat about the color black? 

I Subject: Yes. that’s right. I think black and death and bats all 
are the same. J.ikc vampires— black is the color— but also 
the shape was important. {This phase only used on cards in 
j tchich the examiner needs more information). 

Summarizing, then, the three basic phases and their modes of operation 
are: 


Performance, Nondirective and free association period where 
subject gives the first thoughts that come into his 
mind. 

Inquiry. Some nondirectire prodding to find out what made 
the subject respond as he did. This phase makes 
scoring possible and gives chance to the subject to 
supplement and complete his original responses. 

Tesiins the iimitt. Degrees of pressure exerted to find out what in the 
card made the subject respond as he did. Used 
when two other phases have not produced rationale 
for responses. This phase is not scored, but used as 
clinical evidence. 


Scoring 


There arc two widely used and different procedures for scoring the 
Rorschach (Beck, 1944, Klopfer and Kelley, 1946. Klopfer et al., 1954). 
The scoring procedure we will discuss is based on the Klopfer technique. 

The examiner in scoring the Rorschach has the option of using the quanti- 
tative method and/or content analj^is. Let us look at the quantitative pro- 


cedure first. 

First the examiner looks for the location of the response. Where in the 
blot did the subject sec, for example, the bat. If he used the whole blot, 
the response is scored W. If he used a large part of the blot, it is scored D. 
If he only used a small unusual part of the blot, it is scored Dd. Use of the 
white space rather than the black ink blot is scored S. , . , 

The next concern of the scorer is the determinant for the ^bj«t s re- 
sponse-the characteristic of the Wot that promoted the response. The primary 
dsterminants are figures ir. human-like eaion (M), en.mals m mimnW.ke 
action (FjM) abstract movement (m), shading (fc), and color (c). In addition, 
each response is evaluated on its relcvanqrto the blot. 



204 INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

The third category is content. This area is focused on what actually is 
seen in the blot. Among some of the frequent content responses arc human 
figures (H), human details (Hd), animals (A), objects from animal parts, 
like fur (A obj.), man-made objects (obj.), and so forth. 

Each response is also analyzed to determine if it is original or popular. 
The definitions of original and popular response depend on the classification 
system used. In general terms, howe\-er, a popular response is one given by 
a majority of subjects, whereas an original response is one that is rarely 
if ever seen. 


Interpretation 

The interpretation of a Rorschach record is a difficult and laborious process 
requiring rigorous training and clinical experience. A detailed analysis here 
would be at best an incomplete statement. Several textbooks dealing only 
with the Rorschach are usual for the serious study of the technique. There- 
fore, the reader should note that what follows is only a simple and extremely 
brief summary of some of the key features of interpretation. 

Quantitative Analystj. In this approach the interrelationships between 
various scoring classifications previously mentioned (e.g., W, FM, A, P) 
are noted and studied. Some of the possible interpretations, for example, 
are 


1. A great many human responses (M) that are appropriate for the 
given ink blot and are sufficiently explained is a sign of high 
intellectual endowment. 

2. When M’s appear in optimal relationship with animal responses 
(FM), this is a sign of self-acceptance. 

3. Color responses, shading, and texture give evidence of the subject’s 
emotional life. 

4. The location areas that the subject chooses may suggest his typical 
approach to problems and situations with which he deals in everyday 
life. 

Content Analysis. Certain responses may be interpreted symbolically*, 
for example, snakes and totem poles may indicate sexual feelings. The 
absence of human figures may indicate the subject has little empathy for 
people. Animal resportses in young children are quite normal; however, an 
adult who gives a great many may be infantile. In addition, certain cards 
usually evoke certain responses; the absence or avoidance of these may reveal 
characteristics of the subject. 

Synthesis. The proper interpretation of the Rorschach requires the exam- 
iner to bring to bear all the features mentioned in an analj'sis of the quantita- 
tive and content relationships and their frequency; of the consistency of the 
\’arious hypotheses that are derived and of how well they go together with the 



205 


rROJECTIVE TECHNIQUES 

nM rnT',' This process requires 

not onlj training ami skill but an intorire approach called cJinical iutjgment. 
1 bis dependence on clinical feeling is one of the major hurdles to be over- 
come in the statistical validation of the Rorschach. 


Evafitation 

The Rorschach is a ver}’ difficult instrument on which to establish statistical 
validity because of its inherent reliance on the clinical experience of the 
person scoring and interpreting the record. The various studies and investi- 
gations do not reveal empirical validation for the scoring system of the 
Rorschach. The Rorschach has been found to have little predictive validity 
when compared to psychiatric diagnosis and other criteria. The studies that 
do reveal validity ha^•e been criticized on methodological grounds. 7'hus, it 
must be stated that from a statistical and experimental viewpoint the Ror- 
schach is no better than an interview device. On the other hand, it must also 
be stated that a great many clinical and school psychologists view the Ror- 
schach as a valid and reliable instrument based on their professional experi- 
ences. In addition, psychiatrists rely heavily upon the psychologists’ findings, 
which are based largely though not completely on the Rorschach. 

In the author’s experience as a clinical and school psychologist the 
Rorschach has been found to be invaluable. Although recognizing its limita- 
tions, the author feels it is valid if administered and interpreted by a trained 
anti experienced person. Its chief limitation to the clinical user is that it is 
time-consuming. 'I’he average administration, scoring, interpretation and 
write-up of the Rorschach takes between six and eight hours. 

The Rorschach is used by the school psychologist only when other devices 
arc inadequate and only .’ hen parental permission is obtained In these 
circumstances it can be of invaluable assistance. 


Thematic Apperception Test 

Another projective technique in wide use is the Thematic Apperception 
Test (I’AT’). It was vieveloped in 1935 by two psyxdiologists, H. A. Murray 
and C. D. Morgan, of the Harvard Psychological Clinic (Murray. I93S, 
1943) The TAT is almost as widely used as the Rorschach. Clinical and 
school psychologists use it as one of the basic instruments in their psycho- 
logical teat buttery. In uJJit.on ,0 its years of scrt-.ce m the d.agnost.c t»t 
battery, it has also serted as a model for later instrumeots tth.ch hate the 
same story-type format. 

Test y4{/niimsfrafio/i 

The TAT consists of a set of pictures shooing human figures in different 
posei an^actionl Some of the pientres are only for boys, others for gr.ls. 



206 


individual tests of intelligence and personality 

some for adults (over fourteen years of age), and others are for all individuals. 
There are nineteen pictures for a particular age and sex and one blank card. 
The psychologist, in most cases, does not administer all twenty cards but 
selects only those he considers particularly appropriate for a given person. 

The psychologist instructs the subject to tell him a story based on the 
picture presented. He requests a past, present, and future for each of the 
stories. The exact instructions sometimes vary, but they always include 
the request for a past, present, and future in the subject’s story. Generally, 
the instructions are the following: 

1 I am going to show you some pictures. I’d like you to tell me a 
story about what is going on in each picture, what has led up to it, 
what is happening now, and what may happen in the future. Remem- 
ber that there are no right or wrong stories, only what you see. Here’s 
the first picture — now tell me a story about what has happened and 
j what is happening and what you think will happen. 

The TAT stories are either taped or taken down verbatim by the 
psychologist. No time limits are given and the subject may tell a short or 
long story, depending on how he feels. 

Scoring 

The scoring of the TAT is not quite as time-consuming as the Rorschach. 
Many different scoring techniques have been advocated for the TAT. 
Shneidman (1951) presents many of the methods commonly used by 
recogniaed experts. These range from an emphasis on the content of the 
stories (interpersonal relations, parent-child, etc.) to a statistical-normative 
approach. Whatever method is used in interpretation, the examiner attempts 
to obtain a whole picture by anab’zing all the cards administered rather than 
by drawing conclusions from only one card. 

Let us examine one commonly used method, a content analysis of the 
stories. This is a clinical method whereby the examiner attempts to discover 
the psychodynamic causes of disturbed behavior or of the level of adjustment. 
The first step is to read the record for a general impression. Secondly, the 
examiner summarizes each story and obtains its salient features. Thirdly, 
each story is analyzed for possible conflict, mother-child relationships, 
sexual identity, hostility, aggression, defenses (for example, rationalization 
and projection), and so forth. After these analyses have been made, they are 
put together to form a general picture. Themes such as anxiety, sibling 
conflicts, and other disturbing features that recur on more than one card 
arc looked for. As Lasaga (1951) states, 

P If a person has extremely intense worries or conflicts, these worries 

|j or conflicts will show up a^in and again in a large number of the 

L stories of the test, possibly in most of them. This means that every 



207 


PROJECTIVE TECHNIQUES 

TAT certainly reflects several aspects of the whole personality of the 
subject being tested, but, when one or more intense conflicts exist 
in the subject’s life at the moment the test is made, what will appear 
first of all in the TAT will be these conflicts [p. 145]. 

Let us took at the story of an eleven-year-old boy in the fifth g"*- 
story is in response to a TAT card which presents a boy of about the same 
ageLting down and looking at a violin and bow with his hands supporting 
his face. 

Well this is the past. He was going with his mother to buy a violin 
and dropped it and tripped over it and fell flat on his face and blood 

fall again. 

The boy wl» gave wi^'lS!; 

because of academic and 33 good as other kids, 

children and felt inadequate ..j behind. I am dumbr 

When asked about school h' 3 pi„, The other TAT 
What do you think ^ and other tests revealed a child 

stories as well as his Rorschac P situations, 

who lacked social „jieating possible criminal behavior as he 

More importantly, >"f*So,Ve"apy w/s ^commended. . 

grew older were noted. appear at the conscious, partially 

Lasaga (IDSl) ««« 'Xl“e s'^jec. is ..tong the rear. Thus, 

conscious, or unconscious level wniie 

Lasaga (1951) concludes that. 

When a test 

completely trhua 1 ** js L supposed that such stot.jw 

:a:s:ira'’:ySroit:^;orieformworti«^ 

his real life [p- I''®]' 

Evalmlion , investigations. Little 

The TAT has been used in many r validity of the TAP 

iSSSse SSSsssi" 

terms does not warrant its use. 



208 INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

supported the claims of TAT advocates, clinical day-to-day experience has 
shot™ its wide and beneficial use in clinics, hospitals, schools, and in industrial 
appraisal for executive employment. 


Other Projective Story Techniques 

Make a Picture Story (MAPS) 

The MAPS test tvas developed by Shneidman in 1947 and has been 
referred to as a “younger brother” of the TAT. It is different from the TAT 
in that it varies the material by separating the figures and backgrounds and 
allow's the subject to select and place his choice of figures on a blank back- 
ground before he relates a story. 

The materials for the M-\PS consist of uventy-two background pictures 
(81 by 11 in.) printed achromatically on cardboard. With two exceptions, 
there are no figures in any of the pictures. Some of the unstructured back- 
grounds are a stage and a blank card; others are semistructured, such as a 
forest and desertUke landscape; and others are structured, such as a medical 
scene and bathroom. 

There are sbety-seven figures with various facial expressions and various 
tj'pes of dress. Among these are males and females, children, Negroes, 
Orientals, Mexicans, and animal figures. See Figure 16, which presents the 
figures used. 

Shneidman (1951) illustrates a typical administration of the MAPS: 

MTiat I am going to do is show you pictures like this, one at a time. 
I' [Livingroom background picture teas placed directly in front of the 
subject.] You will have figures like this [at this point all the figures icere 
poured out of their envelope onto the table top] and your task is simply 
, to take one or more of any of these figures and put them on the 
■ background picture as they might be in real life. We might start by 
sorting the figures so that you can see each one. Spread them out on 
j the table. 

After all of the figures had been placed on the table by the subject the examiner 
stated, 

) I would like to go over the instructions ... all you are to do is take 

: one or more of any of these figures, put them on the background as 

^ they might be in real life, and tell a story of the situation which you 

^ have created. In telling your storj', tell, if you can, ivho the characters 

, are, what they are doing and thinking and feeling, and how the whole 
things turn out. Go ahead (p. 19], 




Figure 16. iMake A Picture Story materials. (From E. S. Shneidman. Reproduced 
by permission of The Psychologrcal Corporation.) 


The MAPS is essentially an unstandardized test investigating psychosocial 
areas of fantasy. A formal scoring system has been developed, but there are 
no statistical data that support its validity or reliability. As with the Rorschach 
and TAT its primary role is in clmtcat. rather than objective, evaluation. 

The Blacky Pictures 

These pictures were originally constructed to test certain psjxhoanalytic 
concepts. The Blacky Pictures (Blum, 1950) consists of ten cards that display 
cartoonlike drawings. These cartoon drawings center around the “adven- 
tures” of a dog called Blacky. 

Blacky can be of cither sex, depending on the projection and identification 
of the subject. The test ma)' be usnl with children but was originally designed 
for adults. The main theme of the cartoons centers on psychoscxual develop- 
ment. The method of administration is like that of the 'I'AT except that 
the subject is asked, in addition to telling a story, to answer a set of standard- 
ized quCTtions. School psychologists must be especially careful in administer- 
ing this test because of the overt sexual connotations. As in all personality 




210 INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALITY 

assessment, parental permission must be granted before the test is given. 

The Blacky Pictures can be a useful tool with many research advantages. 
It is not, however, an objective or standardized instrument. Used by com- 
petent clinicians, it can be of diagnostic assistance. 


Symonds Picture-Story Test 

The Symonds Picture-Story is basically a projective technique designed 
to study the personality characteristics of adolescent boys and girls. It has 
the same test format as the TAT, but it uses a set of pictures constructed 
to study adolescent fantasy. The administration and scoring are similar to 
the TAT procedures. There are twenty pictures which may be given to 
boys or girls. 

Symonds (1948) suggests that the examiner’s main task in the content 
analysis is to record the principal psychological forces in the stories. He lists 
fourteen forces which should be noted in the content analysis; 

n (1) Hostility and aggression, (2) love and erotism, (3) ambivalence, 
r (4) punishment, (3) anxiety, (6) defenses against anxiety, (7) moral 

I standards and conflicts, (8) ambition, striving toward success, (9) 

r conflicts, (10) guilt, (11) guilt reduction, (12) depression, discourage- 
ment, despair, (13) happiness and (14) sublimation [pp. 10-12]. 

The Symonds test, like other clinical instruments, has little statistical 
validity but may be useful to the skilled examiner. A full explanation and 
detailed aspects of the test can be found in Symonds’ Adolescent Fantasy 
(1949). 


Children *s Apperception Test 

The Children’s Apperception Test (CAT) is very similar to the TAT, 
but animals rather than people arc employed. This substitution is based on 
the assumption that children can more readily identify with animals than 
with people (Beliak, 1954). The animals are pictured in typical activities of 
humans in the same manner as in books for children. The CAT is designed 
to stimulate fantasies centering around possible areas of conflict such as 
eating, sibling rivalry, aggression, toilet-training, and other developmental 
experiences of childhood. 

I*he administration and scoring of the CAT, as well as its validity and 
reliability, are similar to the previously mentioned picture story techniques. 
Sc^T^al studies have shoivn, houcv’cr, no significant difTcrenccs between the 
story responses of children exposed to human and animal figures (see 
Armstrong, 1954, and luruya, 1957). It has been this writer’s experience 
that most psychologists use the TAT for both children and adults, and only 
occasionally use the C.AT as a replacement for the 'I'AT. 



211 


PBOJEcnVE TECHNIQUES 

Other Types of Projective Techniq ues 

Jf'orrf Association Test 

Word association tests attempt to reveal associative connections bctivecn 
certain prescribed words and the free verbal responses of the subject. This 
technique has a long history dating back to Gallon and Wundt and carried 
forward by the well-known psychoanalyst Dr. Carl Jung (1910). Jung used 
words that were common to erootjonal fixations. He tvould state a word and 
then record the subject’s ^-erbal reaction word and the time taken to respond. 
The Words, reaction lime, and behavioral mannerisms while responding 
were recorded and an analysis was then made, 

Kent and Rosanoff (1910) designed a free-assoctation test as a psychiatric 
screening instrument using common, “normal" words. These words tended 
to stimulate common associations rather than atypical responses. Frequency 
tables u-ere developed that contained the number of times a response to a 
given word was found in 1,000 adults. If a subject replied with a different 
response, that is, if it was not found in the table, his response was labeled 
idiosyncratic. This test’s validity was obviated somewhat when other variables 
in addition to mental illness, such as age, se.v, culture, and education, were 
found to influence responses. 

Sentence Completion Tests 

The Rotter Incomplete Sentences Blank is a good illustration of a sentence 
completion test. The Rotter comes in three forms— High School, College, 
and Adult. All three forms have forty stems such as "The happiest day" 
and ‘‘I love." The High School and Adult forms differ slightly in the wording 
of a few items. The College form was used in the initial standardiaation. 
'Fhc subject is asked to express his true feelings in completing each stem. 

The usual kind of clinical interpretations of the content of tlie sentences 
are made. In addition, numerical scores can also be obtained. The scoring 
method is predicated on three categories; conflict or unhealthy responses, 
neutral responses, and positive or healthy responses. There is no time limit. 


Draw-A'Persan Test 

Karen Machover is responsible for the well-known Draw-;\'Person 
projective device. It is her feeling, as well as that of many others, that a 
drawing of the human figure is tied up with the personality of the individual 
doing the drawing. Alachover (1949) states, 

n When an Individual attempts to soU-c the problem of the directive 

I to "draw a person." he is compelled to draw from some sources. 

I External figures are too \'aricd in their body attributes to lend them- 

: seh^es to a spontaneous, objeaive representation of a pereon. Some 

process of selection involving identification through projection and 



212 


INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALirV 

T introjection enters at some point. The individual must draw 

1 consciously, and no doubt unconsciously, upon his whole system of 

psychic values. The body, or the self, is the most intimate point of 
! reference in any activity. \Ve have. In the course of growth, come to 

! associate various sensations, perceptions, and emotions %vith certain 

i body organs. This investment in body image as it has developed out 
i of personal experience, must somehow guide the individual who is 

! drawing in the specific structure and content which constitutes his 

• offering of a “person.” 

\ Consequently, the drawing of a person, in involving a projection 

j of the body image, provides a natural vehicle for the expression of 

j one’s body needs and conflicts. Successful drawing interpretation has 

I proceeded on the hypothesis that the figure drawn is related to the 

i indwndual who is drawing with the same intimacy characterizing that 

I individual’s gait, his handwriting, or any other of his expressive 

J movements [p. 5]. 

Administration of the Draw-A~Person Test (DAP) is relatively simple. 
The subject is given a sheet of paper {8| by 3 1 in.) and a medium-soft pencil 
with eraser and told to “draw a person.” If the subject is anxious about his 
drawing sldll, the examiner reassures him that “this is not a test of artistic 
skill.” After completing the first drawing, the subject is given another sheet 
of paper and told to draw a person of the sex opposite to the figure in his 
first drawing. If the first drawing was of a man, he is told “now go ahead and 
draw' a woman.” The examiner may sometimes question the subject after the 
completion of the drawings for “associations,” asking what the person is 
doing or w'hat his age is. 

Interpretation of the DAP is based mainly on psychoanalytic theory 
(Machover, 1949). The drawing analyst looks for many things, including 
the size of the figure and where it is placed on the sheet. Shading, erasures, 
and background also lend themselves to analysis. For example, Machover 
states that “if the fist is clenched he may literally be expressing his belliger- 
ence” (p. 35). The types of clothing, nose (large or small), feet, neck, head. 
Ups, and so forth, are also considered. 

Research studies that have attempted to validate the DAP have yielded 
conflicting data. The DAP, as in the case of its projective brothers, is not 
easily quantified. It must almost be taken on faith. It is the type of instru- 
ment in which after years of usage one may have great confidence but have 
little scientific evidence W'ith which to prove some of its claims. It has been 
the author’s experience that it is a very valuable diagnostic tool when used 
by the skilled diagnostician as one of many measurement tools. 

House-Tree-Person Test 

The House-Trcc-Person Test (HTP) was devised by J. N. Buck with 
certain preconceived ideas of what a house, tree, and person should mean 



PROJECTIVE TECHNIQUES 

'TdliuU.r.lo„ of .he .es. is sWer 7 ^”' , “''J: 
asked to draw a picture 0 a drawings have been completed, 

r™1'S.a"Si basic LiLaie (see Fisher and Fisher, 1950). 
The Bender-GestaU Test 

nS,Tes%rBe,.der-Ges.a,.asaJs^l«^^ 

focuses on the subjects ® . emotions, attitudes, thinking, 

TAT which examines the ‘'«holc person 

“"^:S;r.Ges.aUisb^o„— 

Ir^'utSSultscl^l psychology) patterns for her test. Figure 

17 presents these nine patterns. . j^gj^ument to evaluate intelligence, 

l\e Bender-Gestalt is 'S“ brain damage. It is east y 

maturation, psycho bgical the following; I 

administered. The 'f are .0 copy, '■“^1 ^ wU 

nine simple designs (or > „„ „f these 

ake.ching-on this paper. tanh d« gn^|.^ 

show you one at a time. Fherc n 

:=!b:5=xsa^ 

loglats use the ? „c.ions. 

evaluating the subject s ret 


ualua.ion.ofProjec.iveTechn.ques 

our review ofprojcc.ivc.cch^7- 

sho. ^ ::“Hniqucs fen .ha. someday 

‘'’“:”;vilfbfs'.a.istolly established. 

,eir vie\M« 




Figure 17. Designs of the Bender-Gestalt Test. (From Lauretta Bender, A 
Visual Motor Gestalt Test and Its Clinical Use: Research Monographs, No. 3. 
New York: American Orthopsychiatric Association, 1938, p. 4. Copyright, the 
American Orthopsychiatric Association, Inc, Reproduced by permission.) 


'Fhis author believes that projective devices do not lend themselves to 
quantification any more than the clinical judgment of the psychiatrist or 
family physician can be experimentally justjfied. This does not prevent 
us from seeking help from these professionals. The positive experiences of 
clinical and school psychologists and their professional co-workers such as 




215 

PROJECTIVE TECHNIQUES 

Dsvchiatrists and social workers are testimony to the effectiveness of the 
Pfoi-ive nrethoC. Psychia. J., of 

own evaluations and, more imp y. battery of tests, and his 

";Ss^r"T,he m&e presented by al, t.t data, the person's 

not meet the requirements f ^ instruments or 

:e\^r™S:.iiTe'"« ^-e to con., one to use these devices for per- 

nray have to refer a student to Certitonon by the state 

the qualifications of the P'™" .i should be checked, Sehoo 

amhoSshoSnso guard agams.^^^^^^ 

rqusn:r--" 


References 

BeAS.J. /terretaUtr. . • ■ 

^tSo^yls^bn «r 

-erneg. (2nd ed.) New Vo.k: Harper i 

"Totw^ , p A.e..ofe^.•.n-^C^’w"^^ 

19J9. s', 389-113. 



216 INDIVIDUAL TESTS OF INTELLIGENCE AND PERSONALirY 

Funiya, K. Responses of school-children to human and animal pictures. Journal 
of Projective Techniques, 1957, 21, 2'48-S2. 

Jung, C. G. The association method. American Journal of Psychology, 1910, 21, 
219-69. 

Kent, G. H., and Rosanoff, A. J. A study of association in insanity. American 
Journal of Insanity, 1910, 67, 317-90. 

Klopfer, B., Ainsworth, M. D., Klopfcr, \V. G., and Holt, R. R. Developments tn 
the Rorschach technique. New York: World Book Co,, 1954. 

Klopfer, B. and Kelley, D. M. The Rorschach technique. New York: World Book 
Co., 1946. 

Lasaga, J. I. Analytic technique. In E. S. Shneidman (Ed.), Thematic test analysis. 

New York; Grune and Stratton, 1951. Pp. 144-62, 

Le\'y, L. H. Psychological interpretation. New York: Holt, 1963. 

Little, K. B., and Shneidman, E. S. The validity of thematic projective interpre- 
tations. Journal of Personality, 1955, 23, 285-94. 

Machover, K. Personality projection in the draxcing of the human figure : A method 
of personality investigation. Springfield, 111.: Charles C. Thomas, 1949. 

Murray, H. A. Explorations in personality. New York: Oxford Univ’ersity Press, 
1938. 

Murray, H. A, Therhatie Apperception Test manual. Cambridge, Mass.: Harvard 
University Press, 1943. 

Murstein, B. Assumptions, adaptation level and projective techniques. Perceptual 
and Motor Skills. 1961, 12, 107-25. 

Pascal, G. R., and Suttell, B. J. The Bender-Gestalt Test ; Quantification and validity 
for adults. New York: Grune and Stratton, 1951. 

Rorschach, H. Psychodiagnostics, (translated by P. Lemkau and B. Kronenburg) 
New York: Grune and Stratton, 1942. 

Shneidman, E. S. (Ed.) Thematic test analysis. New York : Grune and Stratton, 1951. 
Symonds, P. M. Manual for Symonds Picture-Story Test. Nesv York: Bureau of 
Publications, Columbia Teachers College, 1948. 

Symonds, P.M. Adolescent fantasy : An investigation of the picture-story method of 
personality study. New York: Columbia University Press, 1949. 



PART FOUR 


Group Standardized 
Testing 


CHAPTER 


Scholastic and Special 
Aptitude Tests 




holastic AptiW<>= ,„, „,her than 

measu''^- Authorities differ m ^19 


220 GROUP STANDARDIZED TESTING 

results of these tests. Some feel that they arc measures of innate intelligence, 
whereas others feel that they fall short of this because of their reliance on 
factors such as culture, educational exposure, and achievement. (Sec Chapter 
2.) Most authorities, however, would apee that they measure with some 
accuracy, though not perfectly, an individual’s chance of success in school, 
because they measure the skills required for educational progress. One of 
their main reasons for being is to help the school understand and plan 
appropriate instruction for individual students. 

The issue of whether they in fact measure innate ability or intelligence is 
primarily of theoretical interest. The educator is concerned with educating 
students and meeting their needs. If the tests predict chances of success 
or failure in the school setting they arc useful. (See Chapter 2 for a more 
complete discussion of this issue.) The educator can then make alterations 
or additions in the curriculum and tailor individual programs of study for 
children with special educational needs. 

Investigations over a period of more than forty years have shown a positive 
correlation between scores on group intelligence tests and academic success. 
Although these studies have not shown a perfect or near perfect correlation, 
there is enough evidence to demonstrate their usefulness in educational 
planning. 

Measurement experts, realizing that the group intelligence test is not 
necessarily a measure of innate intelligence but is a good indicator of school 
success, have changed the title to convey more accurately what it really 
measures, that is, a person’s scholastic aptitude. Today most test publishers 
and consumers agree with this reasoning; however, because terms are not 
easily changed, the terms IQ or general intelligence tests can still be found. 

The administration of group tests to children should not begin before the 
age of five or six (kindergarten or first grade). Below the age of five individual 
tests should be used because of the importance of assessing test behavior and 
the need for controlling the child’s attention and motivation. (See Chapter 7.) 
Even with five- and six-year-old children careful attention to their test 
behaNior and understanding of directions such as how to turn pages correctly 
is extremely important if valid results are to be obtained. For example, in the 
Handbook for the Cooperative Primary Tests (Educational Testing Service, 
1967b) a pilot test is given youngsters before the scorable tests are ad- 
ministered, so that the children will know how to respond. Even with the 
pilot test, however, some children are still not sure of what they are to do. 

While experience with the Pilot Test in pretesting and norming 
situations has indicated that almost all children can answer almost 
all items on the practice test, or at least understand what they are 
supposed to do, the teacher may occasionally find a child who does 
not seem to be able to handle the tasks it presents. If, after a second 
trial with the Pilot Test at a later time, this still seems to be the case, 
the teacher is probably well advised not to go ahead to administer 


SaiOLASTlC AND SPECIAL APTITUDE TESTS 221 

n other tests in the scries to this child. Interpretations from the other 
[j tests might be more misleading than helpful [p. 7]. 

Group tests for the first two or three grades require no reading. The 
mstructions arc given orally and mually one or too evamplcs arc presenttd 
before the testing starts. The child marks his answer in the booklet with a 

as follows: 

V Verbal Meaning- The ability to understand ideas expressed in 
\ — \ crbal Mea g J important single 

simple quantitative pr differences. At the lower 

understand and rccogni <1 . j j^y ^ pictorial test that 

grade levels fZ-tti^pfoSslfeaLuld. Attheupper 

requires no reading. _ „ rtrohlems are included, 

levels arithmetical reasoni g p b likenesses and 

P-Pereeptual Speed; The » J „„a accurately, 

diffetences belween fJ'f * °^ug reading skills, but tetids to 
This ability IS For this reason it is included 

plateau at a |es'igued for the lower grades, 

only with the 'I’"' visualize objects aod figures 
S-Spatial nelations; The them. The test measur- 

rotated in space and " , , „[ ,he PMA and is important 

ing this ability appeaia in eve^i 

throughout the school years [P 

■ leLdUl^ollowing fiay- „„o colored Sets^f 

the Verbal Meaning of 

■ .sampling of its® 

Figure 19 eoutaius a sampii 

the test. . Test requires the child to P jhe 

The Spatial of a square aod the child 

a picture. For example, So P 



'Z E -a 5 1 

rs -a rt u •!!! J. 


s.s iS 
S SH 
6.H 

2 c 

■ 'i -a 

L'O f' 

e 

n i. ' 

<0 £ 

00 m ^ 

gll 

a 

s ^ ~ 


i ^ c s'? t § 


’ S S 


fH JZ 'C in O - 

V t-, C. j o ^ 

iU& III 


■'e g'"-g©B c S I 

iSisS-l-SS: 

> ^*3 'C O c c ^ ,E ^ •' 

S-r^eCcgc 

2 « 2 2 ’0 F--- 

'c P H C3 * 




Spatial RelotlonsTest 


1 





TlZo 19. Selected 

(From t'xaminer'l Manual PM/ll^ri”tary Thelma Gwinn 

L, L. Thomtonc »"■< 

I'hurstonc. Reprinted by f emission of the puwisn , 
ates, Inc., Cln'cago, lilinois.) 


224 


GROUP STANDARDIZED TESTING 

that comnletes it The Number FaciUtj- Test requires the child to wnte 
the pUem on the line. The ability to tecogntze s.m. ar 
objects or pictures is required in the Perceptual Speed Test Here the 
is asked to mark X’s on the two pictures that are exactly alike. 

The test is scored by counting all the correct answers in each sert ^ 
Raw scores (number of correct answers) are converted to quotient equ‘va>^ 
and percentiles. Norms for each section are given as well as for a total 

Tests (Kuhlmann and Finch, 1932). This series contains tests for grades on 
through twelve. Over 40,000 pupils in selected schools in thirty states were 
included in the standardization. Total time for administration is a little ove 
thirt}’ minutes. The test is largely nonverbal and the publishers claim it is 
“fair to individuals of var^nng cultural backgrounds.” In grades four throug 
twelve answers may be written in test booklets or on separate answer shee^. 

Figure 20 presents sample items from Test Y, which is for fifth-^ s 
students. It consists of five subtests, each of which is five minutes lon^ 
Subtest 1 requires the student to find the word that belongs in the secon 
pair. Subtest 2 asks the student to find the picture with arms held like the 
first picture. A has both arms up; B has right arm up. Subtest 3 requires the 
student to study the first five numbers in each row' to find out what number 
should come next. Subtest 4 presents three pictures in each row. The student 
looks at them and chooses the one of five possibilities that goes together wth 
the three pictures. Subiest 5 requires the student to find the word that does 
not belong with the other four. 

The raw score is converted to a standard IQ and mental age. The publishers 
offer a complete scoring service (“one-week ser%’ice”) prodding rank order 
or alphabetical listing with the mental age and standard IQ. 

A well-known and respected series of scholastic aptitude tests is the School 
and College Ability Tests (SCAT). These tests were first published in 1956. 
The latest rc\'ision is the SCAT Series II (Educational Testing Service, 
1967c). In the general description of the tests the publishers state, 


Scries II of the School and College Ability Test (SCAT Series II) 
was designed to provide estimates of basic verbal and mathematical 
ability. Scores will be useful in comparing a student or class with 
other students or classes, comparing performance on the verbal and 
mathematic subtests, estimating growth of these basic skills over a 
period of time, and in predicting success in related activities [p. 5]. 


The SCAT tests yield three scores; Verbal, Mathematical, and Total. 
Total administration time is between forty-five and fifty minutes. Actual 
testing time is forty minutes. There arc four levels of difficulty: grades 
4-6, /-9, 10-12, and 12-14. The format of all grade levels is essentially the 
same, differing only in difficulty of subject matter. Each has parallel forms 



Sub-Test 1 








226 


GROUP STANDARDIZED TESTING 


comparable in content coverage and difficulty. Students record their response 
on separate answer sheets. Figure 21 presents sample items from the SCAT 
Series II Form 4A (grades 4-6). 

The SCAT )4elds three scores: Part I(\*erbal), Part II (Mathematical), and 
a Total. The scores for Parts 1 and II arc obtained by counting the number of 
right answers, and the total score is calculated by combining both parts. The 
raw scores are converted to percentile bands (for e.xample, 86th to 96th per- 
centile) and percentile ranks (for example, 92nd percentile). The publishers 
state t^t the verbal section of the SCAT Series 11 correlates 0.69 with the 
English grade and the mathematical section 0.58 with the mathematics grade, 
and the total score correlates 0.68 with the grade-point average of fifth-grade 
students in five selected schools (Educational Testing Service, 1967c, p. 42). 

Figure 22 presents selected items from other forms of the SCAT. It should 
be noted that the same format and directions are employed at different grade 
levels ; only the content of the items varies. Directions for these problems are 
gh'en in Figure 21. 

The following items are representative of the type of questions found in 
most scholastic aptitude tests. For a more complete listing of aN'ailable tests 
and their evaluations see the Mental Measurements Yearbooks. 

Verbal'— Meaning (\'ocabulary) 

Underline the word that means the same as the first word. 

QUIET a. Blue b. Still c. Tense d. Watery 
Verbal— Analogies ' 

Hat is to head as shoe is to 

a. Arm b. Shoulder c. Foot d. Log 
Sentence Completion 

The sun sets in the and rises in the east. 

a. Summer b. Jlonung c. West d. End 
Reasoning 

Study the series of letters below. What letter should come next ? 

ABABABAB 

a- B b. D c. A d. E 
Numbers Series 

\\Tiat number should come next to continue the series: 12 4 7 
11 16? 

a- 18 b, 19 c. 20 d, 21 e. 22 
Number ' — 

Add the following columns of numbers and underline R for right and 
Iv for wrong. 


(1) 16 R W 

(2) 42 

R 

W 

38 

61 


— 

45 

83 



99 

176 




Pari I DIteclions 

E*ch quesiwfl betini wiih t»-o words Those two wordi fo w*«iw ■ 
eerteift w»y. Under them, there ere rour ether peh* td wordi letiered 
A. B.Cand D. 

rind the ktiered pair of words that go tosether io toe ume way u the 
first pair of words. 

Then, fmd the row of hoses on your answer sheet «>» 
number as the nucslion In this row of boxes, mark the Irtter of the pair 
of words you have chosen 

Sec how these eumptes sre marked; 

t.siAMrLLl cair«eow n 

A puppy • Answer 


cSe'-bil CiHSE® 

n sheU : turtle 

1. ji. Bn. pi. ,r »•«. («ir . ™»). ^ 

« ipitru ■ youne cO*. 

The on), lettered pa-f of words that *o together to toe -me way fa 

poppy 1 dog. A puppy fa a young dog. 

Box A b marked because the letter in front of poppy • dog « A. 
f X/VMPLE I minute « second i« 

A UtM : clock Answer 

E.0EEB 


■ P.« I, ^ ^ e..™ 1 

Each of the foUowio* ttuesuom h** ^ • 

CpluBsn B. .^,.1 j 

thuroworboxei,inark: ^ 

* .rtheMrtuiColurnBAtoft«»*^* , j 

DtfSp^InCelumnBfaSteater. 

' C iftbe‘»ol“"*"*^“* ' ^„w»r 


EXAMPLE! 1 ■ [1] 0 

ColumnAlirwfa*- Cs.lutrut B ^ 

The ^ 

XAMPLEi ^ The p«tw Column B 0 [5] ■ 1 

Uausctbepaf«*«_«“ ''t'rsm Part I and Part 11 of the 

“"TeS°ngs'.riS'us=<>"y 

mission of the publisher.) 


228 


GROUP STANDARDIZED TESTING 


Arithmetic Reasoning 

Four SIO bills arc equal to how many S5 bills? 
a. 20 b. 40 c. 10 d. 2 c, 8 

Abstract Reasoning r e a 

All four-footed creatures are animals. All horses arc four-foote . 

Therefore: a. Creatures other than horses can walk. 

b. All horses can walk. 

c. All horses arc animals. 

College admissions tests are also scholastic aptitude tests. Thej' will be 
presented and anal)’zed in Chapter 11. Tests that yield a scholastic aplitu c 
lest score but are also intended for vocational and other uses will be discussed 
in the following section along with professional tests. 


Special Aptitude Tests 

In this section we discuss vocational aptitude tests and batteries and special 
tests in art and music. In addition, a brief review of professional aptitude 
tests is presented. 

The main function of the aptitude test is to measure the potential capacity 
of an indiridual. Its job ts not to measure what has been learned, but what 
can be learned. .Although aptitudes are generally thought of as being 
completely apart from training, it is impossible to isolate any aptitude from 
some kind of learning e.\pcriencc. Thus, aptitude tests may indirectly measure 
what has been learned, as well as what can be learned. Their main objective, 
however, is to measure the potential to learn, whether in school, the creative 
arts, or vocational pursuits. 

Aptitude and interest are often thought of as being equivalent. They are 
not. The youn^ter who likes to fix things around the house, the adolescent 
who takes apart an automobile, the boy or girl who spends time playing the 
piano or painting a picture arc showing interests. Whether they are revealing 
abilities or talents is another question. On the other hand, the individual who 
takes apart an automobile or practices the piano develops skills which enhance 
abilit}'. .Adults may indeed pursue vocations that coincide with their interests, 
abilities, and skills. Children, however, have not generally experienced 
enough different activities to judge their own particular abilities and interests 
adequately. Also, a child or adult may persist in an activity simply because 
he has developed some skill (learned ability) in that activiU', although he has 
neither high ability or interest. One job of the school is to expose the child 
to new experiences and activities so that he may “try them on for size, 
hoping that something will fit, and he mav choose a career in which his 
interests, skills, and abilities coincide. The objective aptitude test prorides 



Gradei7-9 (SCAT Series II -Form 2A) 


36 Weight of iroun«> 


\ 7 hill i niotmtain si 
( A island : sea 

' B brook : river 

[’ c tree : foresl 

j D city : country 

r 8 nu-ul : lianiiuct ii 
I A diamond : jei*-*! 

I B car : limoosmc 

I C design : ornament 

[ D silver : gold 


37 The number x if 
X 3 
8“4 


I 

i Grodes 10-12 (SCAT Series II -Form 2A) 


J 7 bicker t argue is 

1 A bother ; Insult 
B tease : annoy 
C enjoy • dislike 
I D kno* • disagree 

i 8 crowbar i prying i 
1 A knife : sharpening 


, of too feel Adisunceofjomitc 



16 *»il 


I A knife ; sharpening . pj ^ 4«as of un- 

( B cup s measuring |2 Area of shaded re* , haded reponi of the large 

\ c - weighing the large square 

Q jack : lifting 


Grodes 12-14 (SCAT Series II -Form lA) 


7 p»intl.ni«*i 5 palming n 
^ A word : reading 

! B pencil : engraving 

I C stylus : coloring 

5 D pen : writing 

1 

I 8 bee i «warm s» 
f A hive : honey 
I B mammal : herd 

; C sheep : flock 

! D bird : nest 


APOR Ungth of in A STU 

Length of »bovt . 


■ / r.h. 

Figure 22. Selected ‘’'"Jf, ,1. Cop)"^^ V 

School end CoH^' .( Edoctmosl Test, ^ 

Testing Service. Used oy I 



GROUP STANDARDIZED TESTING 

::S!t 

Questions as, “When should a child start in a reading program, an , 
to studem ready to begin his study of algebra?” Tbe =*ool must hate 
meaningful data about its student body. It especially needs to know 
individL differences. Information is required to help , 

of instruction to the needs and abilities of the pupils, who ""“J 

in their range of talents. In addition, information on the strength 
weaknesses of each pupil presents a valuable background for indivi 
vocational and personal counseling. 

It is extremely important for a person’s future adjustment that i 
tional and vocational planning be made intelligently. Many rasw of e 
tional and vocational maladjustment could have been avoided u p^ 
guidance had been available. Many times, students at the ninth or tenth gra 
seem to have no real problem in making vocational decisions. Their pa 
seem intelligent enough from the viewpoint of the school, their family, an 
themselves. Others, ho\ve\-er, are not as fortunate, and they often «. 

although some arc not aware of their situation. Many adults state. If on y 
had known”; the school attempts to avoid this feeling in future adulte y 
helping the child “know” while there is time to make an intelligent decision. 
Other students may be unhappy about their achievement but believe their 
status is inevitable. ... 

Thus it is obvious that the school needs not only to have all possi 
information on each individual student for program planning, but also to 
know each student’s abilities in order to provide personal counseling. An 
aptitude test or batter^' of tests can help in this work. The counselor who has 
objective data about a youngster’s aptitudes can help the child work toward a 
constructive utilization of his abilities. 


Vocational Aptitude Tests 

Aptitude %1'as previously defined as the ability to learn. Thus vocational 
aptitude tests attempt to measure a youngster’s ability to learn in certain 
occupations. They do not pinpoint a person’s exact career; however, they 
do provide answers to such questions as, “Is it realistic to consider medicine 
as a career?” ‘‘Can Mary’ consider a job as a secretary?” ‘‘Would Bill be 
better suited to be a mechanic or an office worker?” ‘‘Should John go to 
college, and if so, what type of school — technical or general?” 

The school needs to have information on a youngster’s aptitudes to guide 
him intelligently into various educational programs and occupations m 
which he has a realistic chance of succeeding. It is important to remember, 
howcs’cr, that aptitude tests will not make the decisions for an indtvidua 



231 


SCHOLASTIC AND SPECIAL APTITUDE TESTS 
but will provide useful data in planning future objectives. 


Clerical Aptitude Tests 

Test instruments designed to measure clerical aptitude all have in common 

an e^ha P=4.-l speed^iugtam (1935) d«.^ 

typewriting, bookkeeping, and i, also needed, 

Slv-tris'Seen 95 and !«.. They state further that, 
n When promotability is a mieffigenre 'should be heaeily 

1 w;!^i:srwtrrn‘;e%'Lhao^.si:e^^^^ 

1 S,l^Tstede^Sretr.^ 1^' 

Detailed studies of clerical »" <«“ """ 

and dexterity are very important. Our 

mainly measure speed and aewraey^ Psychological Corporation) is one of 
The Minnesota Clerical Test ( 1 h ‘ y aptitude. It ^ 

the better known "“""'S&t^n and over, arf 

there are trvo columns 

two members of each pai rc-ults are scored for spe 

very short time “"t.'^asks on this test follow; 

Items that are similar to the tasks 


Namber ehckws 
7345 

317M 31789 

85634 SS“4 


.. .„„„f,lrrialirn>.h”'>' 

riginally calleU me . 



232 


GROUP STANDARDIZED TESTING 


Name checking 
John G. Smith 
The Chase Fuel Co. 
Alger R. MacDonald 


John C. Smith 

The Chase Fuel Co., Inc. 

Alger R. MacDonald 


Another tveU-knoten test of clerical aptitude is the Cleri“'J“* 

(GCT). On this test there are four separate scores-^lencal,t erbal, 

and total. These scores are derived from nine subtests. The first 1' ' 

checking and alphabetizing, ate designed to measure speed and 
The verbal score is ascertained by combining spelling, reading, comp 
sion, vocabulary, and grammar, and the numerical score is obtained by test 
of arithmetic, imputation, error location, and arithmetic reasoning. 

Many more tests of clerical aptitude, such as the Purdue Clerical Adapt- 
ability Test (Latvshe, Tiffin, and Moore, 1936) could be mentioned; however, 
our two examples serve as illustrations of the basic format of the clenai 
aptitude test. In our discussion of the Differential .Aptitude Tests later in t s 
chapter, we shall review one other clerical test contained in that battery. 

Super and Crites’ (1962) comments on the MinnesoU Clerical are quite 
germane for most clerical tests. They state, 


r 



When appraising clerical promise it is well to use tests of bo 
perceptual speed and intelligence. . . . When the test is used at t e 
junior high school le\'el for curricular guidance purposes, grade 

norms are to be preferred. It seems wise to use adult norms 

c\‘en with high school juniors and seniors. ... In using the adu t 
norms, emphasis should be on the occupational rather than on t e 
general norms. ... As a rule speed on this type of test is a goo 
measure of accuracy’. But there are occasional exceptions, and one 
subject will make a given score by working rapidly with errors, 
whereas another will make the same score by working more slowly 
without errors. For this reason the psychometrist or counselor shoul 
examine the responses to each test, and take the error score into 
account in making his Interpretation [pp. 178-79]. 


Mechanical Aptitude Tests 

In the United States the emphasis is on college training, and yet there is 
a large segment of our student population who, because of ability, interest, 
or other reasons, will not attend ojllege. These youngsters may ha\e 
mechanical ability and the)' should not be forgotten by their community, 
for society needs mechanical craftsmen as well as doctors and lawyers. The 
school that plans programs for these youngsters needs some basis of objective 
appraisal in selecting students and arranging appropriate courses of study. 
The mechanical aptitude test can serve this end and other vocational needs 
very well. 



SaiOLASTlC AND SPECIAL APTITUDE TESTS 233 

It has been discovered that some mechanical jobs require the ability to 

assembled from its parts, ho\\ a J ^ jn^vements of 

different point of view, and how j [yp, pe illustrated 

another. Test questions that measure his w<= rf 

later In the chapter, "'tHtoent totio or abilities are some- 
It is important to remember th . . j aptitude. Some instruments, 

llTfmeerantltfo-air^^^^ 

between hoi’s and girls. eonfused by the results of mechanical 

Some teachers and counsel ^ a student is 

aptitude tests. They assunae, for results as indicatjve 

slaterltobeanenginecr^^ wher^o fought with possible 

of lower scholastic ability. . ? oaly one of many abilities an 

errors. First of all, "'“tanica al«.lu<U ^ > ^^^p.aiaal aptitude, the 
engineer needs to be . .aS ound in science and mathematic^, 

aspiring engineer must have a good b^ckg 

as well as general scholastic ability^ Secon >, scholastic ladder is 

Z rMnk of mechanical ability “ '^VZly cU 'rS' 

;;rvrd’o!:tnrc;ni«i-k;M" 

only paper and pencil andemoeg 

Individual Tests yp pf items guch deJees 

Individual 3 P^/or blocks, as %yeU as to dexterity are 

to use tools and mater . Motor abdity and 

as a push button or a these tests. , dexterity is the 

import. d° and h^rl^-^rfor ,Hs te^ f- 


important Ingredients m . j best-knoivn iw.i» 

One of the most t„,. The age «ne= roiia of 

Minnesota Rate °.I ’"“"'f^TOists of a formtoard , identical discs, 

ihlrteen to fifty. The test There are si«y 


thirteen to fifty. The i„ each tow. There “t^^'g^hdes of the 

identical holes, with fiheen h 1^^^,^^ '-ctl^mincr places thediscs 

a little larger >>’“ “ * „f,y the hoarf ^; ,„d asks the subjee^m 

discs ate 3 „d thci 


ard. The examiner 


in their correct ®\iie ^examination is 

place them in ° bom six to »«'“ ^ and Tweeaer Dexterity 

The total testing ; the O'Connot Fing 

Another widely used test 



234 


GROUP STANDARDIZED TESTING 

Tests These tests are used with adolescents and adults. The 

tS coiSts of a shallow tray beside a metal plate wh.ch has 100 hote 

Imng"en rows of ten holes each. Every hole is big enough to conta m 

three metal pins, 1 inch long. The Tweeaer D«terity Test °PP°ha„ 

side of the boards and also has 100 holes, but these are just shghtly 

the pins, which allows the subject to place one in each hole. A pair ot tweezers 

is used to pick up the pins. . , • j 

In the Finger Dexterity Test the subject picks up three pins and places 
them in each hole, whereas in the Tweezer Test he picks up one pm at a 
time and attempts to place U in a hole. The score is computed on the basis oi 
the total time required to complete the tasks. « i • • Thp 

Another test that is very popular \rith counseling psychologists is 
Purdue Pegboard. The Purdue Pegboard is a 12- by 18-inch rectangu ar 
board with four cups which hold the test materials at one end and two row's 
of holes straight down the middle. The examiner first administers 
by asking the subject to put metal pins in the holes one at a time with h« 
right hand. After completing this assignment, the subject is ^ked ‘ 

again with his left hand. The examiner then asks him to do it with both hands 
simultaneously. The final task is to assemble the metal pin, metal w'asher, 
metal collar, and washer using the right and then the left hands and then 
both. Scoring is computed on the basis of the number of pins placed m 
thirty seconds and the number of assemblies made in sixty seconds. ^ ^ 
Generally speaking, motor tests have been most successful in the prediction 
of performance on assembling and machine-operating jobs. (See Fleishman, 
1953.) It should be noted, however, that jobs requiring less repetitive tasks 
demand more perceptual and intellectual abilities. The important thing to 
remember in assessing the validity of these instruments is the relationship 
of the test tasks to the actual job specifications. 


Paper-and-Pencil Tests 

Because of their convenience papcr-and-pencil tests of mechanical ability 
are used much more widely than induidual tests in schools. Among the better 
tests of this type is the Bennett Test of Mechanical Comprehension. This 
test is made up of items consisting of drawings. The items are concenied 
with the application of physical principles. If a student has not studied 
physics he will not be at a disadvantage in this examination, because knowledge 
of mechanical equipment is not being tested. To illustrate this a sample 
question from the Test of ^lechanical Comprehension is shown in Figure 23. 

There are s'lxty items in the test. Although there is no time limit, most 
subjects finish within twenty-five minutes. There are seven forms which 
vary in difficulty, normative group, and language. For example. Form AA 
has percentile norms for school and industrial groups, whereas Form BB 
is geared to a higher level, including employed technicians and engineers. 
Form CC (Owens-Bennett) is especially directed to engineering students, 
and Form WI (Bennett-Fry) is geared to women. In the area of language Form 




Figure 23. Sample item from Test of ^tcc^anicaI Comprehension, Form BB. 
(Reproduced by permission. Copyright 1941, The Psychological Corporation. 
New York, N.Y. All nghts rcsert'ed.) 


AA-F is the same as Form AA, but the instructions and questions are in 
English and French. Form AA»S is in Spanish, using norms from Cuba. 
Form BB-S is also in Spanish, but the normative group is from Venezuela. 

The Bennett Test is especially useful for predicting success in jobs that 
require the ability to understand machines. Engineers and toolmakers score 
very high on the Bennett. 

The hIcQuarrie Test for hiechanical Ability is a very old test originally 
developed as a rough measure of mechanical and manual aptitude. It is a 
battery of seven subtests each designed to evaluate different factors assumed 
to be important in mechanical jobs. The first three — Tracing, Tapping, and 
Dotting — are measures of manual daxterity; the ne.xt three — Copying, 
Location, and Blocks — arc tests of spatial perception. The last subtest, 
Pursuit, is a test of perceptual speed and accuracy. These differences in 
content have tended to make most users of the test treat each part separately 
in validation studies (Super and Crites, 1962). 

This group test requires about a half-hour for administration. Fora detailed 
e-xamination of this test and other group and individual mechanical tests, 
see Super and Crites (1962, pp. 219-74). 

Teft Batteries 

Test batteries have been developed to measure many things, including 
intelligence, general school achievement, and different vocational aptitudes. 
We shall confine our attention here to the batteries primarily concerned u ith 
vocational prediction. 


236 GROUP STANDARDIZED TESTING 

Many studies concerning vocational prediction have been made. In one 
such study a group of high school students was given a vocational test 
battery. Two years after they completed high school, a comparison of their 
educational and vocational situation and their test scores was made. 
The study revealed that premedical students had scored very high on 
all the tests in the battery. Workers in mechanical and electrical trad« 
were above average on the mechanical test and average or below on the 
other types of tests. This study revealed no definite evidence of success or 
failure based on aptitude tests in specific occupations. Many studies, ho"’* 
ever, do show certain trends that can give us some clues for vocational 
guidance. For example, successful workers in skilled trades do well on 
certain tests, whereas successful clerical workers generally have high scores 
on different tests. 

Teachers and counselors should be very cautious in interpreting aptitude 
test results. It should be remembered that aptitude test results are most valid 
when other tests, such as group tests of intelligence, achievement scores, and 
the general performance record of the student in school and at home, are 
taken into consideration. 

There are numerous vocational aptitude batteries. However, the batter)' 
that teachers are most likely to come in contact with is the Differentia! 
Aptitude Tests (DAT). There are two very good reasons for this. First, the 
DAT lends itself to a school setting in terms of its format and norms. It also 
yields a scholastic aptitude score, which makes the test useful in another area 
and cases the financial burden of purchasing an additional instrument. The 
second reason for the D.^T’s wide use is its careful attention to standardiza- 
tion procedures and the many excellent reviews of its merits by testing 
experts. For these reasons we will devote our attention to the DAT. The 
reader who is interested in reviewing other test batteries® should consult the 
Menial Measurements Yearbooks. 

The DAT consists of eight tests and is available in two forms, L and M» 
with two booklets for each form. Each booklet has four tests. There is also a 
verbal reasoning and numerical ability combination booklet for use as a 
separate measure of scholastic aptitude. 

The DAT was designed for use in the junior and senior high school as 
an aid in educational and vocational guidance. It is based on the assumption 
(backed by some research findings) that “intelligence” is not a single ability, 
but a combination of stweral abilities, lire battery yields nine scores based 
on the eight tests. (Iwo of the eight scores combined yield an index of 
scholastic aptitude.) A great deal of research on the DAT has been 
reported, some of which may be found in the manual (Bennett, Seashore, 


!» battery is the Ranacan Aptitude Classification Testa (FACT). 

because it is scared more for 

ini-ustnal uses and ukes lonser to administer. 



Verbal Reasoning 
Exemple X. 


w8l« as eat is 

continue — slnse 

foot. enemy 

, *ink food 

pri _ industry 
. dnnJt -enein) 


Nomerlcol Ability 
Eiattiple X. 


B rs 
c w 
D sa 


o nslitas tresi'S'i »»u 


CuRipte Y. 


\ IS 
B ii 
C 18 


Abstract Reasoning 
I Esampie X 


PROBLEM nnvws 


1 / 


/ 


PROBLE'l FCCIBE? 


Clerical Speed god Aeoroe/ 
TEST Items 



m 

1 1 



3 

ascwtr nftiw 

n 


□ 

□ 

□ 


SvstfiiiorAvssosJ^ 


X»jc>oo 

TO«^=>=0 

zSoco* 


Nv'.All righo twn-'d.) 

in, New 


U7 







238 


GROUP STANDARDIZED TESTING 


and Wesman, 1966).^ We shall explore each of the tests to obtain some 
insight into their content and specific use in vocational and educational 
guidance. 

Figure 24 presents sample items from Booklet 1 of the DAT. The Verbal 
Reasoning test requires the student to choose from among five pairs of words 
the correct combination to complete the blanks. In the Numerical Ability 
test five answers follow each problem. The student's task is to pick the correct 
answer or “none of these" if the correct answer is not given. The Abstract 
Reasoning test consists of “problem figures” and “answer figures.” The four 
“problem figures” make a series. TTie student is required to find out which 
one of the “answer figures” would be the ne.xt, or the fifth, one in the 
series. The Clerical Speed and Accuracj* test is a test to see how quickly an 
individual can compare letter and number combinations. Each test item 
contains letter and number combinations. These same combinations are 
on a separate answer sheet but arc in a different order. In each test item 
one of the five is underlined. The student's job is to look at the under- 
lined combination, find the same one on the separate answer sheet, and 
record his anstver. 

Figure 25 reveals sample items from Booklet 2 of the DAT. The Mechanical 
Reasoning test consists of pictures which require the student to make judg- 
ments concerning the “truth” of certain situations involving balance, weight, 
and other mechanically related problems. 'The Space Relations test is made 
up of patterns which can be folded into figures. Four figures are shotvn for 
each pattern. The student is asked to decide which one of the figures can be 
made from the pattern showm. 

The Language Usage test contains two sections: the Spelling test 
and the Language Usage Grammar. The Spelling test presents a series 
of words some of which are correctly spelled and some of which are 
incorrectly spelled. In the grammar section the student is confronted with 
a series of sentences dh-ided into four parts (A, B, C, and D). 'The task 
is to look at each sentence and dedde which part is in error grammatically. 
If the entire sentence is free of error, the student marks “E” on his answer 
sheet. 

The publisher of the D.AT has produced a pamphlet entitled “Your 
Aptitudes as Measured by the Differential Aptitude Tests,” an excellent 
description of each of the tests and what they measure. It is also a good 
source for understanding any aptitude test battery and b written for students, 
parents, teachers, and counselors. 'The reader b encouraged to read the 
following partial reproduction of this pamphlet carefully, not only for its 
description of the DAT but as a general guide to what many vocational 
(and some scholastic) aptitude batteries measure. 

* A sery good bibliography of over J20 references, many of which refer to research 
m^ly beanng on the \-alidation studies of the DAT through 1964. For further 
reierenws, see Mental Measurements Yearbooks. 




Swn«cfAs'«»S"‘^ 


longuog* Uiog* ^ a • ‘ ‘ j 

E..Trpl.X, A.n.« » ><• ! i . i « U oooo* j 


Corporation, 



GROUP STANDARDIZED TESTING 


What Is Aptitude^ 

Simply— aptitude is the capacity to learn. You take aptitude tests 
in order to be able to make better predictions of how you can expect 
to develop in school and in a job. 

Your DAT scores, then, are measures of your capacities to learn 
to profit from various courses of study or from training required for 
jobs you may seek. 

These tests give you a way of comparing your abilities — as of now 
with those of boys and girls in your grade. The test results will help 
you evaluate your relative strengths and weaknesses in a variety o 
aptitudes which are important to your educational progress and your 
career choices. 

Think of the DAT scores simply as useful bits of information. 
You will want to consider these test scores along with all sorts of 
other information about you that have already piled up — in your 
mind, in the school records, and in your family’s thinking. Of course, 
you will want to take into consideration such facts as your school 
grades and other test scores; the things you like most, hobbies, and 
out*of-school interests; what courses are available in your school; 
your ambitions; your qualities of character and traits of personality 
such as curiosit)', skill in getting along with others, and ability to 
stick with duties and hard tasks; job requirements; health; college 
entrance requirements; and so on. There are a great many things to 
take into account. 

Aptitude tests will not pinpoint for you exactly what your career 
should be. These tests do not provide specific prescriptions or answers 
to such specific questions as: Can I be a plumber? Should I plan to 
become a physician? Should I be a dress designer? 

But, if you and your counselors will study your DAT scores along 
with other information, you can get answers to some more general 
questions such as: Is it reasonable for me to consider medicine as a 
career? Which xcould be the better job for me — mechanic or office scorker? 
What are my partiadar assets and liabilities to be considered if 1 om 
thinking of becoming a secretary? Would I profit from a college education? 
What type of college? 

Information on your aptitudes can start you off on a meaningful 
study of the various educational programs and occupations you might 
want to consider. The freedom you have in planning your future 
places on you considerable responsibility for realistic thinking about 
yourself. 


* P»ychologi’ca1 Corporation, Your Aplitudez as Measured by the Differential 
Aptitude Test. Reproduced by permisston. Copj-nsht T) 1961, 'Rhc Psychological 

Corporation, Ne**' York, N.Y. All rights reserved. 



SCHOLASTIC AND SPECIAL APTITUDE TESTS 


241 


VTnBAL nCASONlKG 

^'crbal rrasoning fe important in all academic and most non- 
Mademte subjects m high school. If you nerc to take only one test, 
> R «ou d 6c the 6est aH-around predictor of how well you can do 
m school, especially in the academic subjects. Students who score 
average or better should seriously consider college; those well up in 
the top quarter may consider the highly selective colleges. 

Students above the bottom quarter on VR but without a college 
education may be acceptable for vanous superv'isory and managerial 
jobs in business and industry. Other things being equal, /or instance, 
the cmplojee with more \efbal reasoning ability than his fellow 
workers has a better chance of being selected for special training in 
technical work or in supervision. 

Students not planning for college who haie VR as the peak on 
their profile should consider preparing for such verbal occupations 
as salesman, credit manager, order taker, complaint clerk. These 
job names will help >ou think of others also in which verbal reasoning 
.and understanding arc essential, 

People who do poorly on the Verbal Reasoning test should perhaps 
plan on going into some work that will cal) for less verbal ability. A 
person can be successful doing clerical work in an office without 
trying to become head of a department, or successful doing production 
work in a factory without expecting to become production manager. 

If your scores on one or both of the Language Usage tests — 
Spelling and Grammar — arc an inch or more below the VR on the 
profile chart, there is a real chance that you aren’t able to use your 
verbal reasoning ability up to its full capacity. Talk with your counselor 
and tc.ichers about what you can do to improve your writing, reading, 
and other language skills. 

NUMEniCAi, Afliury 

Numerical ability is especially important in such high school 
subjects as mathematics, physics, and chemistry. 

Students who do well on this test are also likely to do w’eJI in the 
arithmetic and measuring so common in business offices, factories, 
service shops, and stores. _ 

Scores on this test predict, to some extent, success m nearly all 
high school ami college courses. Numerical ability is one element of 

all-around ability to master academic norJi. 

An above-average score in NA suggests planning for college or 
other post-high school education. A student who wants to major m 
such folds as marhematies, physics, chemytry, or any bmneb of 
engineering, may cvpect to encounter some difficulty if his NA score 
is not in the top third or top <ioarter. 



242 



GROUP STANDARDIZED TESTING 

Numerical ability is also useful in technical careers not requiring a 
college degree. A score in the second or third quarter on this test, 
especially if scores on Verbal Reasoning and/or the two Language 
Usage tests are noticeably loveer than the NA score, suggests looking 
at technical training programs either in companies or in training 
institutes for trades and crafts. 

Numerical ability is useful in such jobs as laboratory assistant, 
bookkeeper, statistical clerk, foreman, or shipping clerk. Many of the 
jobs in the skilled trades in manufacturing or construction work 
require considerable numerical ability. 

VTl +NA 

Your combined score on these uvo tests provides a good estimate 
of your scholastic aptitude — ^your ability to complete the college 
preparatory courses in your school and to succeed in college. 

In general, anyone with a rating in the upper quarter (75th percentile 
or better) should consider himself capable of performing well in 
college courses. Depending on your current ambitions and your choice 
of college, a second quarter rating on VR + NA also indicates 
college potential. Whether students ranking in the third quarter 
should enter regular liberal arts and science programs is arguable. 
Are you doing very well in high school? Are you prepared to work 
harder than your college mates? What college and what courses are 
you considering? Some students in the third quarter and a few in the 
fourth quarter who want some post-high school education will find it 
practical and satisfying to enter one-year or two-year junior college 
programs in applied arts and sciences, business training, and the like. 

Besides predicting academic success, the VR + NA score gives 
some indication of aptitude for jobs that require more than the average 
level of administrative and c.xecutive responsibility. 

ABSTHACT REASONING 

Using diagrams, the Abstract Reasoning test measures how easily 
and clearly you can reason when problems are presented in terms of 
size or shape or position or quantity or other non-verbal, non- 
numerical forms. The repairman troubleshooting an unusual break- 
dowTi, the chemist, physicist or biologist seeking to understand an 
invisible process, the programmer planning the work of an electronic 
computer, the systems engineer, — all find this ability useful. Carrying 
out a logical procedure in your mind is important here. 

Abstract Reasoning teams up with the next two tests — Space 
Relations and ^lechanicat Reasoning — in prediction of success in 
many kinds of mechanical, technical, and skilled industrial work. 

Students standing high on Verbal Reasoning and Numerical 
Ability have added confirmation of their college ability if they are 



SCHOLASTIC APfl) SPECIAL APTITUDE TESTS 243 

nlM above averap on Abstract Reasoning. But, if VR and NA are 
high and AR is below average, they usually may rdy on the verbal- 
numcnial combination to sec them through. 

Students scoring rather low on VR but fairly high on AR have 
evidence of ability to reason in certain ways despite a verbal short- 
coming. Vocabulary building, remedial reading, and similar exercises 
may help strengthen verbal reasoning power. 

CLERICAL SPEED AND ACCURACY 


Clerical Speed and Accuracy measures how quickly and accurately 
you can compare and mark written lists such as of names or numbers. 
This is the only one of these tests that demands fast work, ft is very 
easy to get the right answer; speed in doing a simple task is what 
counts. Girls tend to score higher than boys on this test. 

While CS.A measures an ability that is useful in many kinds of jobs, 
it is not really needed or expected in most high school courses. In 
most school work it h more important to do your work correctly 
than to do it quickly. But a very low score sometimes indicates a source 
of difficulty with homework or exams. 

Have you done ^vell on others of the Differential Aptitude Tests 
but not very well on this one? If so, perhaps you did not work as fast 
as you could have worked. By practicing, you may be able to speed up 
quite a bit without sacrificing accuracy on tasks that you understand 
well. 

Aptitude for CS.\ is important In many kinds of office jobs, such 
as record-keeping, addressing, pricing, order-taking, filing, coding, 
proofreading, and keeping track of tools or supplies. Secretaries, 
whose most important skills must be in stenography and office 
services, are better if they also can work fast and accurately on routine 
clerical tasks. 

In most scientific research and much professional work mistakes 
in recording or copying an be very wrious. But speed is needed, 
as well as accuracy. A good score on CSA is desirable, then, for a job 
handling data in a laboratory as tveil as for a job in bookkeeping or 
in a bank. 


XtECHANICAL REASONING 

Students tvho do well on the Mechanical Reasoning test usually 
like to find out how things work. They often are better than average 
at learning how to eonsttnet, operate, or repair complieated equip- 
ment While VR and NA are the best predictors of science and 
engineelg grades in rollrge and t«hnlal institutes, a high MR 
score is added evidence of abiUty m these fields. 

Students who do well on this test but whose VR and NA «ores 
suggest Aa. a college engineering coarse tnight be very difficult, 



GROUP STANDARDIZED TESTING 

should look into opportunities in high school technical coursK 
apprentice training, and post-high-school technical institute. Men 
in industry who become technicians, shop foremen, and repai 
specialists tend to be at least average in MR. . , , , 

People who do poorly on this test may find the work rather hard or 
uninteresting in physical sciences and in those shop courses whic 
demand thinking and planning, rather than just skill in using one s 
hands. Many types of work in the construction and manufacturing 
trades also require one to understand machinery and other uses of 
physical forces as well as to have manual skills. 

Girls score considerably lower than boys on the MR and SR test^ 
Therefore a girl who does quite well on these tests, as compared with 
the average girl, may still be far below the average boy. A girl interested 
in mechanical or engineering work should ask her counselor to figure 
her MR and SR percentiles in comparison with boys as well as with 
girls. 


SPACE RELATIONS 

Space Relations measures your ability to ^^sualize, to imagine the 
shape and surfaces of a finished object before it is built, just by looking 
at the drawings that would be used to guide workmen in building it. 
This ability makes some kinds of mathematics easier — solid geometry, 
for example. 

To a person uho does poorly on Space Relations, an architerts 
plans for a house or an engineer’s plans for a bridge or a machine 
might look like nothing but several flat drawings. But how about a 
person who does well on this test? Such a person looking at those 
same plans can “see” the finished house, or bridge, or machine. He 
could probably “walk around” the finished structure — mentally, that 
is — and “see” it from various angles. 

Students who do well on SR should have an advantage in work 
such as drafting, dress designing, architecture, mechanical engineering, 
die-making, building construction, and some branches of art and 
• decoration. A good machinist, carpenter, dentist, or surgeon needs this 
sense of the forms and positions of things in space. 

Students planning for careers not requiring college training should 
consider their SR score in comparison with their other aptitudes in de- 
ciding whether to look for jobs (or training courses) that deal with 
real objects — large or small, watches or skyscrapers — rather than with 
people or with finances, for example. 

LANGUAGE USAGE 

[..anguage Usage is composed of two short achievement tests 
which measure important abilities you need to consider along with 
the other aptitudes assessed by the DAT. 


245 


SCHOLASTIC AND SPECIAL APTITUDE TESTS 

Z' *■'" ? English 

llaZ lyprand Znh ”?• " “ 

» P'renn can sccngniae mistakes in the 
pmmar punctuation, and nording of easy sentences. It is among the 

and'cZZ'°'^ '“'s'’ 

While some careers, such as imling and teaching, call for a high 
d^rce of competence in English, all careers requiring college-level 
education require good language skills, and so do most office and 
managerial jobs in business anJ industry. 

If you do wed on both of these tests and on VR. you should be able 
to do almost an)’ kind of practical writing provided you have a 
knowledge of your topic and a desire to write about it. 

On the other hand, a student fairly high in VR but low on either 
or both of these two language tests, probaWy can pwHt from spech! 
study or tutoring in English to bring his language skills up to the 
lex’el indicated by his VR score. 


The student of measurement should keep in mmd the fact that vocational 
aptitude batterUs alone Kill not solve educational and vocational problem, 
I'hey will, however, provide valuable information for students and others 
in planning future goals when interpreted in conjunction with other 
evaluative criteria. 


Prognostic Tests 

The aptitude tests used to predict school and artistic success or failure 
are called prognostic tests by psychologists. The inquiring student may right- 
fully ask xvhether this type of special lest is better than an academic aptitude 
test. The answer is not easy. IMany testing people feel that some day the 
general aptitude test, such as the Differential Aptitude Tests, will take the 
place of aptitude tests designed for special fields. Today, however, prognostic 
tests have an important place in the school’s testing program. Prognostic 
tests are especially useful in spotting children who may be able to perform 
in special academic areas. The important thing to remember is that special 
aptitude tests can predict failure more accurately than success, for success is 
in part determined by motix-ation, social pressures, and other factors. In 
general terms we may state that a person xrith superior intellectual endow- 
ment may or may not be successful in college, but xve can be fair y certain 
that an individual with very Joxv ability xxill be unable to succeed academically. 
Let us, then, look at some of these special aptitude tests. 


Reading Readiness Tests . . u l • • c 

Reading readiness tests are generallj- .n the J ’ 

irst year in school. They help the school .n gatorng some ,od, canon of the 



246 


GROUP STANDARDIZED TESTING 

child's ability to progress in reading. For example, Miss Smith, a first-grade 
teacher, wants to know which children are ready for reading and which 
children may have difficulty. She decides to use a reading readiness test to 
help answer this question. Upon reviewing the test scores, she finds that 
some children are ready to read and others are not. With this knowledge she 
can divide the children into groups of similar readiness and have each group 
work at its own level. She can use the results of the tests as a guide in starting 
a formal reading program and in deciding what type of prereading activities 
she may provide for the children. 

Teachers should not feel that the scores their students receive on a reading 
readiness test will necessarily be an indication of the child’s final level of 
reading achievement. A reading readiness test is used mainly to predict the 
ability of a child to learn from reading instruction in the first year of school 
and many times only in the first few months. Actually, a better source for the 
prediction of final reading achievement is a scholastic aptitude test. In 
addition, it must be remembered that each child's rate of development is 
different. For example, Bill may start walking at an earlier age than his 
brother Jim — even though Jim may start to talk earlier than Bill. In the same 
way some children are ready to read at five years of age, or even sooner, 
whereas others are not ready until they are seven or eight years old. It must 
be remembered also that there is an age difference of as much as eleven 
months among individual children placed in the same group at the first-grade 
level. At this age a few months makes a great deal of difference in physical 
and intellectual maturity, and thus affects what he is capable of learning. 

There are many reading readiness tests, each having different kinds of 
tasks. Some require rhyming or matching sounds, and others use oral 
vocabulary with pictures. For example, in the latter type of problem the 
child is asked to identify a picture of an object that the teacher names. The 
teacher may say “cat” and ask the children to circle the picture of a cat 
from among pictures of a cat, a dog, a horse, and a bicycle. Almost all the 
reading readiness tests require the child to be able to match figures or simple 
words by sight. The test item may show a star and beside the star four 
figures; a star, a circle, a square, and a diamond. The child must be able to 
memorize ’ the star and pick the star figure from among the other four 
figures to get the item right. 

The Metropolitan Reading Readiness Test is one of the widely used 
readiness tests. It is made up of six tests. The first is called Word Meaning. 
The person doing the testing presents four pictures to the child and verbalizes 
a word that would identify one of the pictures. The child is then asked to 
point to the picture that is the same as the word. In the second test the 
examiner shows the child four pictures, but this time instead of calling out a 
w;ord. he states a phrase or sentence. The child is then asked to point to the 
picture that is the same or means the same as the phrase or sentence. This 
test is called Sentences. I he third test. Information, is similar to the first 
two tests, except that here the child is called upon to point out objects in 



SCHOLASTIC AND SPECIAL APTITDDE TESTS 

terms of what they do. He may. for example, be presented with pictures of 
four objects including a camera and be asked to “mark the one you would 
take a picture with.” The fourth test, called Matching, requires the child to 
show his ability to recognize similarities and differences m pictures of 
objects, numbers, and letters. The fifth test, ' 

arithmetic problems. In the sixth and last test. Copying, the chdd is a ked to 
copyTSiple forms, numbers, and letters. This test attempts to find out about 
a youngster's physical and intellectual maturity. 

Another approach in — ^ 

Pive developmental examinations. Inesc range iron nf 

liven at the Gesell Institute of Child 

maturation. Let us briefly turn 3, ,he Gesell Institute 

The developmental “ ,,55, seven different parts; 

of Child Development (Ilg and Ames, IDOb) tan into 

1. ne initial ■'d-a- Questions abou^g^^ 
party including favorite activity and present 
• j fotK«»r’soccuDauon. .. 


dimensional forms g\(,5“giving his facial expression, 

pleting / iron's m< '“I'?' 

3, Right and Uft olt tiaglt ani iatMa 

Naming parts pfctulcs of a pair of hands m 

rert;^o%':;l-sr-po^ u .st verba, and then 
"'U*™- , Cl mlonroel— matching forms; Visual Three 

, Wmiiiav 0 / aniatal’ for 60 sternds. 

SSoa 0/ irerh. Recording of bo.h eruption and decay or 
BUings[p. 35 ]. „f,he child's readiness to 

These separate tests 

Ltura.ioncheckhs.^“-P^ 

example of this app -55 a chedd«at fo^^ ^ ^ - jevclopmental 

Entrance and Rea mg R'^^ The elminer'sjob to 

There are twenty-five sta it is tn 

areas. Each area begins witn 



248 


GROUP STANDARDIZED TESTING 

check each statement that is representative of the child’s maturational 
performance. Given below are the five areas and representative questions 
from the checklist. 

' BODILY COORDINATION* 

j The child can . . . 

1 1. Hop on one foot. 

I 2. Walk three yards on toes without touching heels on the floor. 

. EYE-HAND COORDINATION 

■ The child can ... 

3. Cut out pictures neatly, following straight lines, angles and 

. cur\‘es. 

i 4. Draw a recognizable man of head, body, arms, and legs 

without copy. 

’ SPEECH AND UtNGUAGE COMPREHENSION 

; The child can . . . 

5. Pronounce compound consonants correctly in words such as 

basket, bottle, tree, green, please, thank, sister, brother, 
j Baby talk is outgrown. 

6. Count the five fingers on each hand and add both together 

' to make the correct total of ten. 

\ PERSONAL INDEPENDENCE 

* The child can . . . 

7. Care for self at the toilet, requiring no assistance with paper 

or clothing. 

8. Tell own full name and address on request. 

j SOCIAL COOPERATION 

!, The child can . . . 

’ 9. Recite verses or sing a complete song, and will do so for the 

entertainment of others. 

-10. Play competitive, active games with other children and keep 

i, the rules in such games as: hide and seek, hop-scotch, or 

i: cowboj-s and Indians. 

Scoring of this checklist is based on the number of statements true for 
an indiridual child. If, for example, twenty of the twenty-five statements 
arc true, the child is ready for first grade; fifteen to twenty true statements 

‘From “Individual Record Check List— Matunty L€\el for School Entrance and 
U«dinR Readmes^.’* Uy Kathannc M. Banham. Ph D.. .Minneapolis. Minnesota: 
Wucaljoml Test Bureau, capynsht 1959, by American Guidance Service, Inc. 
Reproduced by permission. 



SCHOLASTIC AND SPECIAL APTITtJDE TESTS 

indicate readiness within three to six months. If a child scores under fifteen 
points, the manual recommends attendance at kindergarten or nursery sthool 
and more definitive testing (Banham, 1959). 

“pmaTsafoTrprudc in mathematics there are many tests, among 

:'?m''atfi":st'.f.heyhmTay» 

subtraction, multiplumtion, an i £„„cncy. Others hate problems 

percentages and the use o m ability to use simple arithmetical 

{hat require abstract reasoning and the ability 
and algebraic procedures. 

FOBEION Languaoi; j,. 3, „( anention focused on foreign 

In recent years there place students in foreign language 

languages. To assist our schools in help - increased the number and 

courses, the ,'rilMtra ly these foreign language aptitutic tests 

r'd^ign‘er!o‘''p:o:i “n l“dica.^-n of a student's probable success 

sissgsissss 

"ont of the P-tTfa'^nThn--'-'- 

l;rm1d\fn'’ 

Greek. There are fi>e part 



GROUP STANDARDIZED TESTING 

memorj'. The second part deals with the ability to learn speech sounds The 
third part measures sound-symbol association ability and calls for knowledge 
of English vocabulary. The fourth part is devoted to sensitivity to grammatical 
structure. The fifth part deals with rote memory. In administering the test 
a tape recorder presents the instructions and test questions. 

Tests such as the Modern Language Aptitude Test do not suggest specific 
languages for study but only that a person has or has not a general language 
aptitude. 

Musical Aptitude 

In the fields of music and art, the need for tests that can measure ability 
is self-evident. Many parents have spent much money on their child’s 
music lessons only to find out years later that the child is tone-deaf or has 
little musical ability. There is no single test that can measure the desire of 
the child to express himself musically or his wUingness to practice every 
day. In music, as perhaps in no other endeavor, the motivation to stick to the 
task and devote lime to learning the skills each and every day are necessary. 
It docs not matter how much talent a person has; if he does not have this 
desire to perform and the ability to slick to it, his talents will never be 
realized. 

Most musical aptitude tests include questions aimed at discovering percep- 
tive and interpretive abilities— that is, telling the difference in pitch and 
loudness. In addition, the person is tested in his esthetic judgment of a 
melody or harmony and a rh)thmic pattern. The test most widely used by 
music educators and our schools is the Seashore Measures of Musical Talents. 
There arc six parts to this test, all of which are on phonograph records, each 
testing a different aspect of musical ability. In the first test the person is 
asked to judge \\hich of two tones is higher in pitch. In the second, he is asked 
to judge the louder of two sounds. In the third, time intei^’als are presented, 
and the student is asked to judge which of two is longer. In the fourth, 
rhjThm is presented, and the individual is asked to tell if one of two rhythms 
is different or if they arc both the same. In the fifth, the task is to judge which 
of two tone qualities is most pleasing. The sixth test is concerned with 
tonal memory; that is, the student is asked to judge whether two melodies 
are the same or different. In each test the judgments become increasingly 
harder with each item. 

Many musicians and other critics have complained that the Seashore tests 
arc not related to the musical actKities of the musician. That is, the ability 
to tell fine differences in time and pitch arc not needed by the musician, 
lie that as it may, the Seashore Pleasures of Nlusica! Talents remains our 
best test of musical ability, and if used with other forms of evaluation, it 
can gi\c some indication of musical talent. In the final anaK'sis, hou’e^'c^, 
}ou should bear in mind that the actual musical achic^•ement and rale of 
prop-css of the person is probably the best predictor of future musical 
achievement. 



SCHOLASTIC AND SPECIAL APTITUDE TESTS 251 

Artistic Aptitude , 

In the field of aptitude in visual art, several tj-pes of tests are a\-ailable. 
There are tests of esthetic judgment, design, and actual drawing. Critics o 
art tests have admitted that these tests can show difcrences betncen art 
students and other groups. However, they contend that this is “ because 
of achievement rather than ability. Thus they state that ue are “ 
what the person has learned rather than hrt abihty to learn or do well in the 

^“Sne of the most widely used art aptitude tests is the Meier 

Test. This test consists of items “.^e same picture 

One picture is a recognised rnasterpiece and he ome 

with some slight change, f 

balance of the picture as a whole. I he stucieni b 

choose the better picture m each pan. 

Other tests, such as the ' which the person must 

drawings. In this teat, hues and dots arc given 

make a sketch. ,„chcrs and counselors should consider 

As in the test of musical ability, tea* achievement, and his art 

such other factors as *'"'”ute before counseling students 

instructor’s rating along "’“J' chapters 12 and 13, for a thorough 

definite terms. (See Super and . ■ , Irtislic aptitude tests.) 

and definitive discussion of musical and aruslic p 

Graduate School Tests ^ _ 

. hitticallv a combination ot 

The tests in the graduate college graduates. The Gmduatc 

scholastic aptitude and achievement widely used test for ad- 

Record Examination ^t'he GRE is nothing more than “ 

mission to graduate school Pa other sections evaluate 

Miller Analopes T« -^^^^_^i, field. 

from many dinercn jg GRb) ana >«. ^ r (j||T„ent 

restricted to ''“f,'l3„eofcontenl.ret“"«’'"”™?^^^^^^^ language and 

applied to pmven j including j" '. ?uaincss®kminisrration, 

prrst^r^chools “Ch os Par special purposes, sec 



GROUP STANDARDIZED TESTING 
252 

Using the Results of Aptitude Tests 

It is extremely important that the indh-idual as well as his parents, teachers, 
and "ouSrs be aLrc of his assets and limitations. The school connselor 
works with many different young people k>nds "f ^ 

The results of aptitude tests help him m gutdmg such jounpters as o 
instance, the boy with average ability who hopK to be a ; 

the student whose parents view him unreahstically and aspire for him either 
above or below his abilities, the person who performs poor y in academic 
areas but is talented mechanically, the girl with superior intelligence who s 
not aware of her potential, the boy from a poor economic background w ho 
willing to settle for an occupation below the one he is capable of succeeding 
in, and so on. 

The aptitude test and/or battery can provide a basjs for assisting not only 
in personal counseling but also in sound curricular planning. The school 
needs to know what courses to offer and who should take them. The data 
provided by aptitude test results help determine an appropriate course ot 

action. • j ♦ tc 

To illustrate, in everyday school terms, the actual use of aptitude tests, 
let us listen to Mr. Sanders, a high school counselor, explain the use of 
aptitude tests to a group of parents. He has just finished his introductory 
remarks concerning aptitude testing. 


Mr. Sa-nders: Are there any questions? 

Parent: Yes, Mr. Sanders, 1 have a question. You said that our 
school gives algebra and foreign language aptitude tests to incoming 
freshmen to help place them in the types of courses that are suited 
to their abilities. Does this mean that if my son does poorly on one 
of these tests he cannot take these subjects? 

Mr. Sanders: No, Airs. Smith, that isn’t exactly what I meant. 
We can only advise you and in the final analj’sis you and your son 
must make the decision. Besides, these tests do not mean that your 
son or daughter should never attempt algebra or a foreign language. 
What they do signify, however, is that the chances for success or 
failure, at this particular time, arc greater wth certain students. And 
it would probably be best if the child who does poorly on these tests 
I waits at least until his sophomore year before attempting to take 

I courses in the particular subjects. The results of these tests are not to 

hurt or bar students from their right of education, but only to help 
' them make wise choices that are in line with their talents. In the long 

' run, the child is much happier for he need not experience failure in 

I areas where his talents are not as great. 

;[ Parent: Do you mean to say that my child hasn’t the right to try’ 

a subject, if you think he may fail? 

[| AIr. Sanders: No, not at all. In a democratic society people have 



SCHOLASTIC AND SPECIAL APTITUDE TESTS 253 

the right to fail as well as to succeed. In the school the same situation 
is true. The point is that the school attempts to educate cvcrj'onc and 
different children have varied abilities. You wouldn’t uant to push a 
child into the water who couldn’t swim, though it is possible he could 
learn while in the water— but also he might drown. In the same way 
we do not want to start a child in algebra if the chances are he will 
fail Isn’t it best to first teach the child in swimming or in algebra the 
essentials of these skills before expecting him to perform? 
ll Parent: I see your point. In other words, tests help to deterrnme 

the most profitable areas of study for the child to enter at this time. 
Mr. Sanders: Exactly, but a child is always given ‘he chance to 

'''par^: Mr. Sanders, may I change the subject a little! 

"My'sontt’L ts in high schoel and he h. had aptitude 
tests in art and mustc. Why don't you P'""'' f 

Mr. Sanders: We do gtve tests f "t U „eedd 

However, we give this type They arc given only 

I mean, these tests are not g«n o " I i„ these 

Itas" O: rna”i^lSs^:::rest=d in feting h,s ahiii.ies and is 
"“pSNTrifthi^also' «ue foy „„,i„„al aptitude 

mechanical, clerical, and o her sUlls-no. For romple. 

If you mean individual tests Differential .Aptitude 

your children ate given a jear of high schoo and 

Tests. This test is S«" ^ gives us a general idea of 

again in the junior year. Th'S «' ‘ vocational areas, 

the aptitudes your child "“F y „f general 'f | 

correct m their ^^in. ki me ,hcrc 

, , the less chance of error. ^ youngster, w 

get information to he p 



254 


CROUP STANDARDIZED TESTING 

is a point tvhen too many tests can be a «astc of time 
But at Jones High School, we give what ,s neccssapj. " " ^ ^ 
stated, not all types of tests arc given to every child. Different 
I jor different reasons are given to different youngsters. 

Teachers and counselors, of course, may find that their school has ^ffercm 
ideas from those of Mr. Sanders or they may discover that financial 
to support an ideal testing program are lacking. Today, with federal and 
state aid, however, most school districts can arrange to have an “cqu™ 
testing program. If certain tests arc needed for an individual pupil whicn 
are not available in a particular school system, it may be possible to relcr 
the student to a public agency or a psychologist in private practice. 

The important questions to keep in mind ate the following: 


1. What information does the school need to provide the bKt educa- 
tion for its particular students in its particular educational and 
geographic setting? 

2. What information is needed to help each child develop hts own 
unique potential? 


Aptitude tests help provide the answers to these very important questions. 


References 

Banham, K. M. Maturity level for school entrance and reading readiness : for kinder- 
garten and first grade. Minneapolis; American Guidance Ser\’ice, Inc., 1959. 
Bennett, G. K., Seashore, H. G., and Wesman, A. G. Differential Aptitude Tests 
manual, forms L and M. (4th ed.) New York: The Psychological Corporation, 
1966. 

Bingham, W. V. Classifying and testing for clerical jobs. Personnel Journal, 1933, 
14, 163-72. 

Cronbach, L. J. Essentials of psycholopcal testing. (2nd ed.) New York: Harper & 
Row, 1960. 

Educational Testing Service. An annotated list of testing programs for selection and 
special purposes. (Rev.) Princeton, N.J.: Educational Testing Service, 1967.(a) 
Educational Testing Service. Handbook cooperative primary tests. Princeton, N.J** 
Educational Testing Service, 1967.(b) 

Educational Testing Service. Handbook SCAT Series IT. Cooperative school and 
college ability tests. Princeton, N.J.: Educational Testing Service, 1967.(c) 
Flebhman, E. A. Testing for psychomotor abilities by means of apparatus tests. 

Psychological Bulletin, 1953, 50, 241-62. 

Ilg, F. L., and Ames, L. B, Scluxtl readiness. Ne\v York: Harper & Row, 1965. 
Kuhlmann, F., and Finch, F. H. Kuhlmann-Finch scholastic aptitude tests. 
Minneapolis; American Guidance Sernce, 1952. 



SCHOLASTIC AND SPECIAL APTITUDE TESTS 


255 


Lau’she, C. H., Tiffin, J. and Moore, H. Purdue clerical adaptability test, revised 
edition. West Lafayette, Ind.: Univetsity Book Store, 1956. 

Thurstone, T. G., Examiner's manual: Primary menial abilities for grades 2-4. 

Chicago: Science Research Associates, 1963. 

Super, D. E.,and Crites, J. O. Appraising rocalimal fitness: By means of psychological 
tests (Rev. ed.) New York: Harper & Row, 1962. 



CHAPTER 





In Chapter 9 we talked about aptitude tests and how they help the school 
in planning educational programs and guiding each individual youngster to 
realize his fullest potential. We stated that the primary objective of the 
aptitude test was to measure an individual’s potential to learn or succeed, 
in school, at a vocation, or in an artistic endeavor. Simply stated, then, an 
aptitude test attempts to measure what a person can do. In this chapter w'e 
discuss tests that measure what a person has done. 

The primary goal of the achie\'ement test is to measure past learning, 
that is, the accumulated knowledge and skills of an individual in a particular 
field or fields. As we have stated, achie\'ement and aptitude tests overlap. 
Can we test achievement without also testing capacity or ability? In a purely 
theoretical sense, we cannot. The difference between aptitude and achieve- 
ment tests is one of degree or objective. Achievement tests emphasize past 
progress, whereas aptitude tests are primarily concerned w'ith future potential- 
ities. Lindeman (1967) presents the differences between aptitude and achieve- 
ment tests quite well when he states, 

256 


257 

ACHIEVEMENT TESTS 

The primary distinction between aptitude and achievement t«ts 

items are selected on the basts of to ^ J ,,,ec,ed on 

ttiroL"S:^:Xpsme„t•o^previousiyspe..^ 
i content and objectives [pp. 107-08], 

The achievement tests used published 

Sdiaera 1 hleJel'r.es" and batteHes can be of unique importance 
in many areas of the total /'“Ito „ „ i„s,rument produced by 

A standardized achtevement test or ba efforts of 

a test publisher for national • examine educational objccti'C< 

professFonal test ''‘P'"" ""Vtirdfffel from the classroom cvamination 
Ld goals. The standardized test d,to ,, 5 , j, made up by a 

in its scientific development. A be used again. Phe teacher 

teacher for her own pupds and may » J investigate in a sn'ntific 

does not have the time, l^“'''toV°'oXr hfnd, standardized aehtcteto 
manner the value ^ scientific procedures to ensure ° ^ 

tests are run through "gatous ^t^ob P __ „r,„dord,zed achtetentent 
In this chapter our attention wall toe 
test. 


lonstruntion of Achievement tests 


...true, ion of Aentevemv... ^ 

The construction o^ an eonsnuomn o^^^ 

:ld to be examined. exhaustive and defin ^nd 

glc’: 

■rintoto irems 

The representative of subject nutter. Dunng 

bcir performance i. 



258 


GROUP STANDARDIZED TESTING 


Reasons and Objecihes 

The first step in the construction of a standardized achievement test ts to 

(Educational Testing Service, 1957) as an example of this first step. 

1. The tests will focus on skills and concept basic 
development in reading, writing, listening, and mathemat.es. They rn 
test understanding and thinking, in addition to memory or matching 

2. Since learning is the major goal of our schools, the tests will be 
clearly related to instructional processes, so that teachers can ma 
direct use of the results with individuals and groups. 

3. The tests will be designed to measure attainment of major 
educational objectives, regardless of particular curriculum programs 
and methods. 

4. The tests will minimize the dependence of one skill upon 
another, for more deSnitivc descriptions of pupil development, hor 
example, no reading will be required on the Listening tests. 

5. Every effort will be made to engage the interest of young children 
and secure valid responses and meaningful demonstrations of their 
ability. 

6. The tests will be as convenient as possible for busy teachers to 
give and score [p. 6]. 


Content Outline 

The second step involves an exhaustive and definitive outline of the subject 
matter to be tested. This is the outline of skills stated in the Doren Diag- 
nostic Reading Test (Doren, 1956): 

Unit I Letter Recognition^ 

A. The ability to recognize the same letter when it occurs ag^n, to 
distinguish it from a letter of similar configuration. 

1 B. The ability to recognize as the same letter, the capital and lower 
I case forms. 

I C. The ability to recognize the same letter when presented in 
1 different type or style, whether in print or script. 

I Unit II Beginning Sounds 

i A. The ability to recognize the sound of a letter and associate that 
j sound with its printed form. 

* From M. Doren, Doren Diagnostic Reading Test of Word Recognition 
Manual of Imtruetions scith Suggestions for Remedial Activities. Minneapolis, iVlinne- 
sota: American Guidance Service, Copyright 1956. Reprinted with permission. 



ACHIEVEMENT TESTS 


2S9 

B. The ability to choose the correct beginning sound hen supplying 
a word in context. 

Unit III Whole Word Recognition , r • -i 

A. The ability to select identical words in a group of words of similar 
appearance. 

B. The ability to make discrimination in sound and appearance m 
words with similar elements. 

Unit IV Words tcithin Words 

A The ability to find the ttvo parts of a compound rvord, to recogmre 
smaller knon-n words in reading a larger compound word 
B. The ability to find smalt helping words w.thin larger words which 

C The'Tbil'itrtnTake judgments in using this form of word attack. 

f 

B. ThrSy~nire a word from .he visual perception of a 
Speech consonant. 

a”' T h^MUy tSif, a word b, its ending sound from a group of 

similar words by £“,tc'ct'’'ending™'' “ 

B. The ability to choose 

sentence, in words J same definition with 

S -r^KrtK, plurals with irregular endings, 

?be S.“ y b-" 

context. 

Unit VIII Wiyning ,„o words that thyme. 

A. The auditory ubdU^y'^E^ , ,wo words whose 

B The ability to recognize. Dy r- 

printed forms rhyme. , . ,|i|,ewordsdonoralwaysrhymc. 

C. The ability to tecogniretha rhj-me. 

D. The ability to reeognwe that aery 

M/A'lWr , vowels and a.ssnciate each with It 

A. The ability to " J°^,„rd by its vowel sound from 

form' to distinguisn «» 

D. The ability to t'“Sn<«. > 

vowel heard U long or short. 



GROUP STANDARDIZED TESTING 
E The ability to recognize the corre« votvcl sound in basic tvords, 

G. ThraXto determine whether a vowel is long or short when 
H TlV^ilh^w some vowel combinations create a 

new sound and to recognize its occurrence and exceptions 

J. The ability to recognize that in some words the pn 
assumes the sound of a different vowel. 

A chedt’^ntechM’s fund of sight words with non-phonetic spelling 
by means of his recognition of the same word spelled phonetically . 
Unit XI Discriminate Guessing ^ , 

The ability to supply missing words from a clue given by other 
in context. 


Sample Administration 

After the standardized achievement test is constructed, it is given to a 
sample group of children. The results ate then analyzed to find out 
whether the test is measuring what it is supposed to measure. For example, 
the authors of a social studies test have decided to construct a test ttia 
will measure the student’s understanding of the currents of history that e 
up to the Industrial Revolution. They want to know whether thej are 
measuring this area of knowledge or whether they are measuring reading 
ability, spelling, and so forth. In addition, they analyze the results to see 
whether children, upon retaking the test inanother form.showsimilarscorK. 
If they find that the test is meeting their objectives and is doing so consis- 
tently, they then consider publishing it. . c h 

This process is not merely based on inspection or intuition. After the 
administration of the test, usually to a thousand or more students, an anal>s!s 
of each test item is made. The requirements each item must meet are ( ) 
easiness, {Z)_discrimitiation, and (3) distribution. Let us briefly examine eac 
of these. 


Easiness 

Analysis of easiness is concerned with the percentage of students who 
answer an indiridual item correctly. If, for example, 100 students took an 
examination and thirty of them answered a specific item correctly, we wou 
state that 30 per cent answered correctly.- Here is a list of sample items 
with percentages of students answering each item correctly. 

*To find the percenUge on each test item, simply tally the number of 
who get a given item correct and then divide by the total number of students taxing 
the test. 



ACHIEVEMENT TESTS 


261 



The preceding items reveal certain q'itfdffficulT 

to be relatively easy, uhercas items 2. 6. 7. and S seem m q 
Items 1, 4. and 10 are between these . because very 

It is necessary to estimate 

easy or very difficult test Items tel! us 1 students from the 

These easy items only sen-e .-one that 50 per cent of 

Thus h provides 

the students answer correctly and au p r 

the greatest number of discriminations. > • g^ydent may answer 

posed of different students on each Test producers usually 

all items correctly and another may m.ss jO per cent or 

remove items that SO per cent or more ansner corrcc > 

less miss. 


DI5CRIM.NATION sB„dardized test constmetion to 

Discrimination analysis « used measures the same thing as th 

termine the degree to nh.ch “ 25 p., cent of stodents (total 

al test in which it is inclodcd. Fimt h otIjs.s of each 

■t scores) and the bottom 25 P“ f 7he°top and bottom students who 
m is made by tallying the the bottom g™P 

swered the item correctly. P ^uo The resulting data in 

btracted from the The larger the d.fferenc . 

e extent to which a gi' en i 

not dii.r^^-r:e;^:=»^ 

is not discriminating and is o „„„ times '^' ./students 

ho know the most. Most auth discriminates. 

1C minimum difference for an item 



262 


GROUP STANDARDIZED TESTING 


Jiasi-ssss-H 

for example: 


1. The sum of 40 and 40 is 

a. 0 

b. 80 

c. 70 

d. 40 


Per Cent Choosing Answer 
1 
58 
16 
25 


An inspection of our example reveals several important fartors. First, 
this is statistically a fairly good item in that a little over half (oS per 
of the students obtained the correct answer. Second, we can see 
some of the answers are too easy and should be replaced by new 
arc better alternatives. This is obviously true of the “a alternative, w > 
was marked by only 1 per cent of the students. . r r u 

Ambiguous items may be identified by this kind of analysis. If, for examp > 
two alternatives, one of which is the correct answer, produce an equa or 
ncar*equal response percentage, there is a good chance that students have 
a good reason for choosing the incorrect alternative. 


Standardization and A'orms 

After a careful statistical analysis of the data, the test is administered to 
thousands of children throughout the United States who represent a crop 
section of the population in terms of age, grade, geographic location, and m 
some cases, race and socioeconomic status. Norms are obtained from this 
analysis and are reported in the form of percentiles, grade equivalents, an 
so forth. After all this is done, a manual for administering, scoring, and 
interpreting the test is written. It is obvious from what has been stated^ to 
this point that a standardized achievement test is quite time-consuming 
and expensive to construct. 

The following sections from the Directions for Administering the Stan- 
ford Achie\cment Test: Intermediate I Battery for Grade Four to the Middle 
of Grade Five (Kelley et al., 1964) present a whole, but brief, picture of 
what aaually takes place in constructing an achievement test. 


T coNSTnuenoN’ 

f [ In preparing this latest edition of Stanford Achievement Test, a 

i major goal was to make sure that the content of the test would be in 

* From iht Stanford Achievement Test : Intermediate I Battery, ^ 1964 by Harcourt, 
Drace & World, Inc. All nghta rcseni'ed. Reproduced by permission. 


263 


ACHIEVEMENT TESTS 

harmony with present objectives and measure what is actually being 
taught in today’s schools. To make certain that the test content would 
be valid in this sense, the construction of the new edition (as of each 
earlier edition) was preceded by a thorough analj-sis of the most 
widely used series of elementary textbooks in the various subjects, 
of a wide variety of courses of study, and of the research literature 
pertaining to children’s concepts, experiences, and vocabulary at 
successive ages or grades. On the basis of this analj^is the author 
prepared detailed outlines of the content to be covered by all tests 
at all grade levels. These outlines speafied the relative proportion 
of conSnt to be devoted to the vaiious skills, 
standings within each area and served as blueprints ^ ^ 

were ultimately to emerge. At thts stage, as “U / 
whole developmental process, teliance was placed on the judgment 
of subject-matter specialists m the sexeral areas 

rmi ANALYSIS PROGUAM 

on results of this <0-=u..^tTsTc^ 

a representative one with respe urban character, 

distribution, size of school s)'®*'™' ,|y ,1,5 teubooks in use in 

— “ -**' •' 

avoiding tct-rclated bia in i„ p„„ically all in- 

Classroom teachers nation would correspond most 

stances, in order that the „ „f the tests. In addition 

nearly to the typical r^lat , j „ery pupU was g'ven an 

to the experimental edition of Test, m orde 

intelligence test, the on. „„ the equivalence of 

that data would be (necessary for comp^ 

ability of tbe groups ,.”„j for checking on the evten 

ability of item-difficulty va t tj-pical one with . ,. 

which the item-analps administered esse^ 

geneml ability level. The have an op 

without time limits in order that aU p P 
least Le grade below and one gta 



262 


GROUP STANDARDIZED TESTING 


i3=5r£S3SS= 

for example: 


1. The sum of 40 and 40 is 

a. 0 

b. 80 

c. 70 

d. 40 


Per Cent Choosing Answer 
1 
58 
16 
25 


An inspection of our example reveals several important factors. Fi , 
this is statistically a fairly good item in that a little over half (aS 
of the students obtained the correct answer. Second, we can see whetn 
some of the answers are too easy and should be replaced by new answers a 
are better alternatives. This is obviously true of the “a" alternative, whicn 
was marked by only I per cent of the students. ^ 

Ambiguous items may be identified by this kind of analysis. If, for examp e, 
two alternatives, one of which is the correct answer, produce an equal or 
near-equal response percentage, there is a good chance that students ave 
a good reason for choosing the incorrect alternative. 


Standardization and Norms 

After a careful statistical analysis of the data, the test is administered to 
thousands of children throughout the United States who represent a crop 
section of the population in terms of age, grade, geographic location, and m 
some cases, race and socioeconomic status. Norms are ‘obtained from this 
analysis and are reported in the form of percentiles, grade equivalents, an 
so forth. After all this is done, a manual for administering, scoring, an 
interpreting the test is written. It is obvious from what has been stated to 
this point that a standardized achievement test is quite time-consuming 
and expensive to construct. 

The following sections from the Directions for Administering the Stan- 
ford Achievement Test: Intermediate 1 Battery for Grade Four to the Middle 
of Grade Five (Kelley et al., 1964) present a whole, but brief, picture o 
what actually takes place in constructing an achievement test. 

n coNSTRuemoN^ 

I! In preparing this latest edition of Stanford Achievement Test, ^ 

|] major goal was to make sure that the content of the test would be m 

* From the Stanford Achievement Tett : Intermediate I Battery, © 1 964 by Harcourt, 
Brace & World, Inc. All rights reserved. Reproduced by permission. 



263 


ACHIEVEMENT TESTS 

harmony with present 

taught in today's schools. To mahe certam that the t^ eonten^^ 
be t-alid in this sense, the “"f' f J the most 
earlier edition) was preceded by a \-arious subjects, 

widely used series of ,he research literature 

of a wide variety of courses o fences, and vocabulary at 

pertaining to chtldrcn s ,1,U analysis, the authors 

successive ages .0 be eoeereri by all .«ts 

prepared detailed outlines of ---afied the relative proportion 
at all grade levels. Three ou '" J? ^^jjg knowledges, and under- 
of content to be devoted to i e blueprints for the tests that 

standings within each area and s ^ ^ throughout the 

were uhrmately to emerge. .At thrs st 
whole developmental areas, 

of subject-matter specialists in th 

ITEM ANALYSIS PBOOitAM .red to approximately 

The experimental about a month of the closing 

49 000 pupils in 19 school ''‘i importance of the decisions 

of schoSl I spring. 1961. terns that were to e based 

with rrepcct to elinunation de to have the tryout sample 

on rosuUs of this f“;".o s”ch characteristics as m -"*1 

=iSS^5a'522%i 

respect, S yas in the „ practically all in- 

avoiding lext-rc jministeied the t .j aa„jspond most 

Classroom administratron would co P 

stances, 

nearly to the W'“' of 'J^TiJry re.I, in order 

to the experi O* O"**/ on the equivalence of 

intelligence tes .t f„, ' ftros (necessary for compar- 

that data “J .atirig the severf f°™sjn. extent to 

ability of the g values): and . j with respect to 

ability of ".a^a® lysis samp'' administered essentially 

io'anemp'^i administraUon - gh« 

TaS"- i'tf-alrrg^i above this intended range, 
least one gf® 



GROUP STANDARDIZED TESTING 
This was done in order to assist in the selection ^""^at would 

level Answering the item correctly, and, in the case of multiple-choice 
items, the number of pupils selecting each of the ''“P™ „ 

to the item as well as the eorreet response. The 
correctly were converted to per cents, and these per cents for succKSi __ 
grades for a given item were considered to conshtute an item F ’ 
revealing the extent to which an item correlated mth progress 'hroug 
school. These item profiles were considered one of the most '™P° 
indices of item validity, and considerable weight was attached to 
in the selection of items for the final forms. Results of this item try 
permitted identification of ambiguous items, of items cither too y 
or too difficult for the grades for which they were intended, Md ol 
items unsatisfactory in other respects. Such items were elimina 
from consideration for retention in the final forms. ^ 

Each teacher participating in the administration of the expejimen 
editions was asked for comments, criticisms, or suggestions to 
improving the tests. Teachers were asked to comment with respen 
to clarity of questions and directions, appropriateness of content, 
fonnat, and tj'pography, suitability of item types, and other aspec 
of the test- , 

The content of the final forms of the test \\-as selected from inc 
total body of material tried out experimentally in such a way that t e 
final tests conform to the original specifications with respect to content, 
relative emphases, etc., that they are of appropriate difficulty for t e 
grades in which they are intended to be used, and that the se\era 
forms are highly comparable in content and difficulty. 

THE STANDARDIZATION PROGRAM 

The first step in the standardization program was the establishment 
of specifications for the norm group with respect to such character 
istics as geographic distribution, types of school systems to ^ 
included, numbers of pupils desired per grade, and the extent o 
participation within cooperating systems. The distribution according 
to region and size of system was established in such a way that t e 
norm sample would duplicate these characteristics for pupils m 
average daily attendance in public and private schools throughout 
the country. It was further decided that in all participating systems 
n-ery pupil in regular classes in at least six consecutive grades 
be included in the standardization program so that there would e 
no question of selection within systems. 



265 


ACHIEVEMENT TESTS 

The desired representation in terms of number and kinds of 
systems ^\•as worked out on a state-by-state basis, and invitations 
were extended to school s)’stems in the various states meeting the 
desired specifications. A sufficiently large number of s>'stems \tcre 
] invited to avoid undue influence by any one sptern. A « 264 
school systems drawn from 50 states participated; over 850,000 
pupils were tested as part of this program. 

THE NORM SAMPLE 

Public .chools (iu.egr.,ed. segregated uhite and 
according .0 sire .L systems compnsing the uotm group are 

'rl;;::tsLaudgeograph^.r.,re...t^^^..n. 

i EeS' mirre r:Xu jA -- : 

"hesTt^-o factors tvere compared with 

y census figures [pp- 23-24]. 

The reader should note that surP>™'"'* 

3 Sil^a^ It'J^P^sInThe test's construct, on. 

Differences Between Tet.cl.eg.MadeandS.a„dntdl«dTesU 

1, The standardized test 

common to teptesentauve , 

teacher-made test ts »awd ■>" rvithin a seho . 
classroom or in some „„„n.rted for a litn.tcd topic, 

IS^TsiSed «Tt enwmpasses large areas of eon e„ ^ 

3. Stslonal edimato. 

statisticians develop * limited number of teachers. 

usu.ny created by one '"^reWd n^of 

5. The standardized tialins tvho attempt 

various educatora and test p 


GROUP STANDARDIZED TESTING 

266 

scientific procedures.- The teacher-made test is usually constructed 
by a person untrained in rneasurement and mthout muc 
perience with scientific techniques. 

The standardized test has obvious advantage over ' 

however, some of the unique attributes o the 
disadvantages. The consensus of professional experts on cumcular 
along with a careful analj-sis of textbooks and 
good general picmre, but it does not allow for md.v.duJ 
of local school goals and student populauons. It is at best a ““P . 

blending of the best thinking, a consensus, an average-not a 
evaluative instrument for a given educational setting. In addition “ 
test is ibted at the point in time when it was developed and published an 
is not flexible to new situations and educational change. It cannot 
limited local needs. It can, of course, be revised, and the better tests are, 
educational goals are changed and modified. ^ 

The chief value of the standardized test resides in its national scope, 
enables the school district to compare its progress with that of other schools 
throughout the country. It enables the guidance staff to compare stu en 
progress with potentiality as indicated by aptitude tests. The norms presentea 
in the standardized test manual make these comparisons possible. 

Ebel presents three common fears or misconceptions educators nave 
regarding standardized achie%’ement tests. The first of these is the feeling 
that the goals of education are too subtle and complex to be effectI^ey 
measured. Ebel (1968) in discussing this states, 


T Teachers of young children know that the development of sk^s 

! in the tool subjects and the establishment of solid foundations for 
understanding and interest in the major fields of human knowledge 
are concrete, specific, important objectives. But some of them m^^ 
j feel that tests, especially objective standardized tests, fail “to get a 
; the real essentials of achie>-ement in these skill and foundation subjects. 

I This mystical des'otion to a hidden reality of achievement which is 
; more essential than overt ability to perform has never satisfied the 

I research worker. He wants to know the nature of this hidden realit) 

\ and what e\'idcnce there is that it is important [p. 257]. 

The second fear is “overconcern” with possible anxiety and stress caused 
the child by testing. Ebel (1968) feels that it is more conducive to mental 
health for the child to know how he is really progressing than to shield him 
from the educational facts of life. “In education, as in medicine and justice. 


* Seiefttific here refers to use of the scicntiftc method and measurement procedures 
which attempt to gauge >*alidity, reliability, standard error of measurement, and so 
forth- 



ACHIEVEMENT TESTS 

an excess of present sympathy can postpone or even defeat the procedures 
necessary for an individual’s future mlftre” (p. 259). 

Ebel’s third point is the overemphasis on the umqueness of ^ <,ol s 
obiectives compared to the objectives outhned in a standardized tej He 
Lt^ts tha” tvhat constitutes a good education in Maine « "Ot f lal^ 
different from ivhat constitutes a good education in Cal'fmnia. He sute 
tha' a teacher should no. expect 

well she has .avgh. everything, ^ ti* the very 

ought to ha%e achievements which arc truly and rightly 

TniquHo rparSar school or teacher, locally constructed tests are the 

best answer” (pp. 259-60). .-.-j,.,-,,™ of standardized achievement 

In weighing the advantages md ^,hink in black or white terms. 

testing, the student of measurement should not. d,y.,o-day 

SLr;^ri.TLs.aXX.^ 

'q^Lte".re"ad: and smndardized test provides a whole 

educational picture of ^jj,e*^[^nt tests can make broad com- 

Thus the user of standardiaed achi achievement and 

parisons between schools or c Ms« b „„ publisher, 

“'irb^'ln a poS ‘O' -- “ 


Types of Achievement Tests 


Types of Achieveiiiv. 

Standardized 

study skills. In using ,be maikct for tests in “eh 

or speciSc subject areas is 



GROUP STANDARDIZED TESTING 


.t .he junior and senior high school level. We shall 

batteries and tests covering different subject-matter areas and at different 
grade levels. 


Achievement Test Batteries 

The general achievement test battery provides the best all-around 
evidence concerning academic progress. Some authorities recommend that 
they be administered every year if the school can afford i . ( 
Nunnally, 1964.) We shall have more to say on this subject in our discus- 
sion of school testing programs. However, decisions on when and how oiten 
any test should be administered are dependent on the local school and its 
unique needs. 


Scope AND Content ^ uu v 

Achievement test batteries differ from one another in their (1) breadth oi 
coverage and (2) level of understanding required. General achievement 
batteries cover subjert-matter and school skills from the primary grades 
through high school. Table 23. for example, illustrates the range of content 
and subtests employed in the various forms of the Stanford Achievement 
Test. Note that in the primary battery only six types of subject-matter are 
measured, whereas the intermediate battery attempts to evaluate ten different 
content areas. It is also interesting to note the progression to that point an 
then the reduction in the number of areas afterward, paralleling the changing 
emphasis in the school curriculum ladder. 

Now we turn our attention to the specific content in widely used batteries 
and discuss what each subtest attempts to measure. 

Word Meaning. Almost all achievement batteries have a subtest concerning 
word knowledge.® The batteries, however, vary in the degree to which they 
measure this area. Some evaluate word understanding in the context of a 
paragraph, whereas others measure it more directly and yield a separate W’ord 
knowledge or vocabulary score. Others combine both approaches and yield 
paragraph-reading and vocabulary scores. Tw’o of the most widely used 
methods, in the multiple-choice format, are showm in the following examples. 
(In the primary grades, pictures are used.) 


1. Boy means almost the same as 

a. girl 

b. man 

c. woman 

d. child 


‘A notable exception is the Sequential Tests of Educational Progress (STEP), 
a battery widely used and respected. 



Table 23 Stanford Achievement Test — Subtests For Various Grades 


•S'S' 

e 




I'-cB 


Hi 


s'SI 

.iJj 

Sill 


ill 




xxxx>iy^ 


X xxxxxx 


XX X XXXX 


XXXXXX 


o< S ^ 

s g &. 

6S<-: 


“ 2 S .e ^ J -o e B a-B -S :§ -0 

•o ocii ^c-s s cn S -c -c >3 o 


E" S'l Oils 


Science 


270 


GROUP STANDARDI7XD TESTING 


2. Boy means the opposite of 

a. girl 

b. child 

c. man 

d- woman 


This basic content area, common to all general achievement 
batterics,'’prcscnts the student with connected passages from " 

The tests vary in types of questions and length of passage. . . • 

L paTslgL fifty to 100 words, with two or 

passage. Others have a small number of long passages (aOO or 

with L many as twenty test questions referrmg to a smgle ^ 

illustrates some sample questions from the Reading es mcihods of 

Primary Tests for grades 1 and 2. each illustrating dilTerent ^e >mds “ 

evaluadng reading achievement. The children are instrurted to ^ 

in the arrow, then mark the box that goes best with it Quet.on J 

to evaluate comprehension by having the student identify an '‘'“s'.™ ' = 

piaure with a word stimulus. Question 2 evaluates comprehension n ne 

same manner but uses poetry. Questions 3 and 4 attempt 

Student’s ability to extract meaning from a paragraph, and questjon 

evaluates the student’s ability to interpret the passage he has read. 

An example of a test item based on a paragraph and questions 
more complex types of reading achic^•emcnt follows.® In the actual 
there arc a total of nine questions on the sample paragraph. 


The cleaoral college is the group of officials directly responsible 
for electing the president of the United States. Each of the fiftj 
has its o^^'n electoral college. According to Article II, Section 1, o ^ 
Constitution, each slate selects the members of its electoral collie 
in whatever ^s-ay its legislature secs 6t, on the basis of one elector or 
each senator, and one for each representative to which the state i 
entitled in Congress. This means that the number of electors in w 
state varies according to its population, and that some states, sue 
heavily populated New York and California, have a greater number o 
electoral votes than sparsely settled states such as Nevada, ^'he tota 
number of electors from all the states constitutes the national electora 
college. 

1. Perhaps the best title for this paragraph would be 

a. “The Importance of Electoral Votes.” 

b. “The National Electoral College.” 


•From Tett J-Readitt;’ of the High School Placement Test. Copyripht, 19 . . 

the Scholastic Testing Sen'ice, Inc. Used by permission of the Scholastic I es 
Sen-ice, Inc. 


AailEVEMENT TESTS 



I have a dog. 

I had a cat. 
I've got a frog 
Inside my hat. 





These leaves come from two 
kinds of trees. , 


Educatio^l -toins Semce- All S 

Educational Testing Service.) 

■■The Electoral College of Stat«- 
!> d>TIowaP.esi<l=.ttIsH^^ 

j 2. The number of electors to DO 

1 a^thc Bill of Riglt'a- . 

i 1.' .he Twelfth Amendment. 



272 


CROUP STANDARDIZED TESTING 


li 


c. each state’s Constitution. 

d. Article n. Section I, of the Constitution. 

3. All the state electoral colleges together form the 

a. Hall of the House of Representatives. 

b. National Electoral College. 

c. House of Representatives. 

d. Constitutional Convention. 


Arithmetic. As with reading, all achievement baltenes attempt to appr^ 
arithmetic skills and understanding. The batteries differ in their f ™ 

phasis on computational skills, problem solving, concepts, and undcrsunding. 

One of the difficulties that test authors encounter in appraising arithmrti 
ability, especially in the area of problem solving, is the separation of reading 
achievement from mathematical skill. The mathematics section o 
Sequential Tests of Educational Progress (STEP) attempts to overcome thi 
obstacle. Educational Testing Service (1957). which developed the test 


is which should 


asked themselves: “fNTiat are the important threads of emphasis vvl 

underlie a good mathematics program for general education from ^des 
through 14:” (p. 77). The major concepts’ they evolved arc (1) number ana 
operation, (2) symbolism, (3) measurement and geometry, (4) function and 
relation, (5) proof-deductive and inferential reasoning, and (6) probabilitj 
and statistics. ^ . 

These concepts receive varying degrees of emphasis at diHcrcnt grao 
levels of the test. Two per cent of the elementary items, for example, contain 
the concept of symbolism, whereas the college-level form de%'Otcs 13 per cent 
of its items to this concept. Questions involving number and operation, on 
the other hand, account for one half of the total number of items at the lowest 
grade level and represent only one fifth of the total at the college le\e 
(Educational Testing Service, 1957). Sec Figure 27 for some representative 
items for grades 10, 11, and 12. . . 

The Stanford Achie\emem Test batteiy provides different arithmeti^ 
items under different subtests. Figure 28 presents some samples from the 
Intermediate I Batter)’. The Computation Test measures stills in addition, 
subtraction, multiplication, and diusion. Some of the areas that the 
“Concepts” test measures are understanding of place value, roman numerals, 
the meaning and multiplication of fractions, rounding whole numbers, an 
geometric terms. The Arithmetic Applications test attempts to measure 
mathematical reasoning with problems that are taken from e%'eT)'day l»c. 
The authors of the test state that they have attempted to keep the reading 
vocabular)’ below the problem-solving level to avoid interference with the 
measmement of mathematical reasoning. Computational tasks are kept to a 
minimum to avoid contamination of the measure (Kelley et al., 1964). 

’ The student interested in a further danfication and discussion of these six tnajo*’ 
concepts is referred to the STEP ^lanuai (Educational Testing Service, 1957). 


ACHIEVEMENT TESTS 


273 


To gel lomeide* of the »vef*ge number of pai- 
lengen per c»r. «he theater manager made ihe 
following tabulalHin of the number of fnsaengen 
in every other car on opening night 


No.ofPaa(«nger» 
Per Car 

No. of 
Cars 

1 

100 

2 

300 

3 

400 

4 

200 


^*OQOJOOx2<^ 

‘’2XiJX3bS-«^> 


Toa«0>ddo“nilrafu.lhetopofachinme)r«ii£>UJ ' 

be 3 leet abme the highetl point of the ruA A | 
^ view of BoVi hou«, with dimenfor*. it ' 
fhOKfl tn the figure below i 


For Iheae JOOO cars, what was the average nombtr 
oTpenoni per ear’ 

A20 BSi C7.1 030 


According to one plan for iravtUng lo Nfais; 
round tnp would take nearly three ear* yeas^ 
ineluding a auy on M*rs of «9 
34,000,000 miles U taken as the disian« be^ 
Mam artd earth, wh.eh of the 
uaed to determine the average speed eftravti m 
mil« per hour* 
f9x36S-449)Xl2 

* S4^)00.1W 
84.000.000 

* <3 X 36S - 447) X 24 

8V34JOO.OQO 



How many feet above grt>«'"l le*’* '“P ‘ 

ofX*.mn»yb«..r.he.lopirgroofro«U-. . 

t>r every 4 hor'ironial fmt’ •, 

*„ »lJi C16 l>» 


thetnc«"ormlaii» 1 

A4 •34 C« _ Oljl 


Test Division, Educational Icsting 
tional Testing Service.) 

fnost acliiet cntcfit 


Usoge 

1. My little sister 

a. doesn’t 

b. don’t 

c. neither 


walk yet. 




274 


GROUP STANDARDIZED TESTING 


I Test &. Arftfirnetlc Computation. 


1 

2 

3 

1 32 

1 1 9 

7 4 

'r -f 8 6 s 114 

- 85 / 23 

X 2 s 78 

i b 118 


b 148 

i e m 

f. 33 

e 148 

\ dl28 

J 134 ^ 

d 128 

! t NO ; 

1 ' 

/ NO 

• NQ 


|Test 7: Arithmetic Concepts. 

! 20 Which is ax thousand twelve? 

' • 600012 fl 6012 

. t 60Q12 h 6,12 


! 


« f a h 

20OOOO 


21 Which of these fractions is the largest? 

‘i ‘i It.. 

1 b ^ j SI O O O O 

; 

f 

I 22 n the sum is 6 and one addend U 2 , the other addend 
\ U — 


• A 

I € 

I 


e S • t a h 

h 12 22 O O O O 


j Test 8: Arithmetic Applicotiom. 

15 The dress costs bow mudi more than the shoes cost? 

» 3< b 51.00 c $1.07 « $16.93 • NG 

16 How much wlj the socks acd shoes both cost? 

I $S45 a $800 h $7.45 3 $8.55 j NQ 

17 H Sus« bun the sweater and uses a ten-dolhr bill to pay for it. 
what tfiouM be her change? 

» $6.01 b $4 00 e $600 d $1.01 • NQ 


Snwr4«y Spttlali 

S>>«M 

S7.9S 

S>M1«' 

i199 

Ob»t» 

$198 

Sod, 

S0« 


Stanford Achievement Test, Intermediate I 

I9M hv H ^ Achievement Test; Intermediate J Battery. 


2. Did Bob and Mary play together.? 

a. good 

b. well 

c. neither 




ACHIEVEMENT TESTS 


275 


Directions: Underline the words that should be capiUlized: 

on february 22, 1732 the father of our country. George washing- 

Virginia. 


SrS" in .he ~ 

lrd':d"tr:oT:r.he undefined epeee, need punCuednn 
marks. 

Mrs— Robert Jones 
2548 Lansdowne Ave— 

Springfield— Illinois 

Dear Mrs— Jones— -oUection of paintings in the museum 

I have recently Do yon plan to tell any .n the 

a. Chapel HiL No^h Ca^-^ fj in.erea.ed .n 

near future— H s<>— P‘ 
purchasing some of them— 

,vhlch are .o be -ad r .eats use .he tnulnple-cho.ce 

Sut-d^rn-Uapelled 

.desk 

^chalr 

^cup 

cecret 

Bill enjoyed his class in 

a. psology 

b. psychology 

c. pscholog)’ 

1070's have Stated that one of 

^r. maior purposes oi 



GROUP STANDARDIZED TESTING 
276 

LformatiL in reference sources. Sonne tests incorporate these tasks m content 
areas such as mathematics and social studies. 

The SRA Achievement Series labels these areas as Using Source of 
Information and Reading Charts. The follotving are some sample questions 
from the SRA Achievement Series for grades 6 to 9. 


i! 


xreiNG SOXJRCES OF INFOR.\UTIO*i» 

Directions; This is a test to see how well you can use reference 
materials. Some of the questions ask about which source you wouia 
use; the other questions ask about what is in a certain tj-pe of reference 
book. There is only one correct answer to each question. Now read 
the sample question below. ^ 

(1) If we wished to find something about Abraham Lincoln, which 

of these would be best to use? 

a. A magazine. 

b. A newspaper. 

c. A history book. 

d. A spelling book. 


Another part of this test is designed to test a student's ability to use a 
table of contents. The student is presented uith a table of contents and asked 
questions such as the following; 

1 (2) The table of contents tells us that trouble with the law Is discussed 

i 

i a. Chapter I. 

; b. Chapter II. 

• c. Chapter III. 

I'l d. Chapter IV- 

I READING CHARTS 

i Directions; This is a test to see how well you can read graphs, tabl«. 

] and maps. First, glance at the chart in order to get an idea about its 
I contents. Then read each question and refer back to the specifiegraph. 
i table, or map to decide which ONE of the four possible answers is 

_[ correct. There is only one correct answer to each question. 

* From SR-\ Achie^•emcnt Series 6-9 by Louis P. Thorpe, D. Welty he(ever, 
and Robert A. Naslund. Cop>TiBht, 1955, by Saence Research Associates, Inc. 
Reprinted by permission of the publisher. 




y with so far arc common 

Content XmenU 

I„“ cent yeart content a'® ^ P vt ATonghout the 


GROUP STANDARDIZED TESTING 
278 

£-r=-3;S=S=£S: 

physics, biology, and astronomy. 


Survey Achievement Tests 

The underlying theory of surt-ey achievement tests ia ^ 

as for the subtests in general achievement batteries. A survey S ^ 

for example, includes most of the same kinds of .terns 

the reading subtest of a general battery. The basic difference ^ 

survey test and a general battery resides in the depth and extent “f “'"“S': 
The survey test usually concentrates in much more detail on a sing 
than does the general battery. Further, general batteries do not usually 
include special subjects, and even when they do, it is at a superficial 
compared to the survey test. This is especially true in areas such as chemistry, 
biology, physics, algebra, and economics. 


When to Use Tests . . 

Survey achievement tests should not be used except for some special 
reason. The general battery, especially at the primary and elementary school 
level, is the usual instrument of choice. This is because the general battery 
and the various subtests are standardized on the same students and dcvciope 
on the same underlying philosophy. Thus evaluation of student scores^ on 
different subtests is based on the same principles and normative population. 
When using survey tests in arithmetic and reading, for example, it is irmre 
difficult to determine the pupil’s relative status in these areas because dis- 
crepancies in scores could be due to differing degrees of academic progress 
or to differences in educational principles and the student population use 
for test standardization. 

In certain situations, however, the survey test is the best choice. There 
are four basic types of special situations that may call for the use of a surv'cy 
test. 


1. To -provide more definitive data on a student tcho does poorly on u 
specific subtest of a general achievement battery. If, for example, a 
student does poorly on the arithmetic subtest of a general battery an 
the school wants more detailed information on strengths and weak- 
nesses, they may obtain a more complete picture by administering a 
sur\'ey or special test for arithmetic. 

2. To measure special areas of the high school curricxdum that are 
not covered or are only briefly presented in a general achievement batter} . 
For example, if a school wants an index of achievement in biology, 
thej' will have to use a survey test of biology rather than a genera 
batlerj’ which appraises general achievement in science. 



279 

ACHIEVEMENT TES'K 

3. To help stiidejil! plan future educational courses salhin the school 
For example, does the student have enough mathematical background 
to take a second year of algebnt? An algebra survey test tn 
with other data could assist the student m making a sound dmsion 
4 To help counsel students for college. If, for example, a student . 
niannine to enroll in a premedical course of study, it would be helpful 

Survey tests in these areas could provide this kind data. 

Of course, the preceding by om'tat basic 

varying circumstMces^ there will be situations that in theory 

types may arise. On the other Factors such as 

different kinds of su,^^ 

tests in Greek and Hebrew to ^ , ^hich should serve as a 

therefore focus our attention on reading 
representative sample. 

An Example: Supvey widely used types of standardized 

The survey reading test IS one of th achievement 

tests. It is very similar to the d g paragraphs and answe 

batteries. Typically eo';eted are (1) abd. V meaning, and 

questions concerning their content, (4, ,.„„„btest 

(3) speed of rcadmg. , y reading test and a 

' The basic difference be"'™" ,he survey test att™P" “ 

of a general battery is, as survey reading test is therefore 

appraise reading ,ime to administer. 

generally longer and ta a survey remedial and 

^2£Sbdqn~j25tS 

survey test is still "“‘^'‘‘ ‘blems on the general battery 
pupils who reveal reading p , ^^vey reading t 

let us turn our atten ion » an representative o mn,), of the 
Reading Record (Bo™ The test is '"""^r./rbins in reading are 
tests used in our b 12). Four bas eveiyday 

rntShtsfudemVoL 

the student’s scores on 
areas follow. 



2S0 


GROUP STANDARDIZED TESTING 


n 


! 



lUTE OF READING* 

Diredlons jor Test t. The first test is a paragraph reading test. Read 
the material as rapidly as you can, but read it carefully, l ou will be 
asked questions about it. When the examiner says STOP, look at the 
number at the end of the line you were reading. Find this number on 
the Answer Pad and punch the cirele next to it. You will have 2 
minutes for Test I. You arc not expected to read alJ of the maicnal 
in the time allowed. 

A Century of Agricultural Progress'^ 

Difficult as it may seem for many of us to bclie\'c, thel ^ 
greatest problem of all nations has been to make this oldj 
earth yield enough of food stuffs to satisfy the hunger ofj ^ 
the many millions that inhabit it. Esperially has this becnj 
true during the World Wars and most keenly have pcoplel ^ 
evcr 5 '\vhere been brought to realize it. Yet until com-J 
paratively recently the genius of the race has not been I ^ 
directed toward improving methods of agriculture. La-J 
bor-saving devices for multiplying the man power and' 
efficiency of the artisan and mechanic began to appear,_ 
but the farmer plodded on in the primitive ways of his 
fathers. Each householder was almost entirely self-sus- 
taining, producing nearly all that he and his family re-’ 
quired. He sold but little and bought less. There was no 
need for producing more, and a virgin soil and large' 
crops did not stimulate inv-enrive genius along agricul- 
tural lines. But with the building of cities, the growth 
of manufacturing and the great divisions of labor it 
became imperative that the farmer should provide food, 
not only for his own family, but for the e\’er increasing 
army of those not engaged in agriculture. To do this 
brought profits, and the incentive of private gain en- 
couraged efforts toward increased productiveness and 
the development of new devices and better methods. 

READING COMPREHENSION 

Directions for Test 2. In this test you will be asked questions about the 
material you have just read. You are not to look back at Test I, Decide 
which of the four answers to each question is right. Then punch circle 
(^)» (b). (c), or (d) for the right answer. 

You will have 5 minutes for Test 2. You are not expected to finish in 
the time allowed. 


5 

6 

7 

8 

9 

10 
11 
12 


* From S/M trading /Record by Guy T.Buswell. Copj-righl 1947, Science Research 
Ir>c. Reprinted by permission of the publisher 
1ms sample represents about one fifth of the complete reading passage. 



281 


ACHIEVEMENT TESTS 


1 The material in Test 1 is mainly about 

(a) gasoline motors (b) planting (c) farm machmery 

(d) Cyrus McCormick 

— enough 

S macWner/ (b) food (c)clod .lng (d) horses _ 

nTlnerease In non-agrlcultnral popnlaUon compelled farn. 
J’aTmtmase (b) to decrea. (c) to remain the same 
(d) to double ^ 

discovery 

evehvdav nEADiNO sKiLirs ^ telephone direetoiy 

Dinctm for Test 4. In the nex^ addresses on the right- 

on the left-hand page, and a Ib I'lv'’”"' 

hand page. Follorvmg each name a ^ ,he 

nuntbers. Pick out the nf ■ 3 „i„„,es for Test 4. Ton are 

circle for that \°h“ me allotred. 

not expected to finish in the time 

Abbott Jas E 5370 Laureh 
Aitken Edmund 672 j 72IO 

Alsdoif Ernest MlW Elm r 

Anderson Mrs Dora -Wl Mg 

Angelo A T 106 Douglas av. . • 

^ ® r M 1946 Barrypomt. . JOJ’ 

Arntacn C R 3J» “ J*" 5301 

Austin Henry 631 3\aKc ^28 

Baker Darwin 405 ^ ^ ,5488 

Baker Duncan f 17 NParUmf 

Baker John R 244 u tvhlch presents a map, 

fn addition to tlm P-^-iS^:Sn;.bom. Ai^er^t P;^ 
table, and a graph with quesb^ 1 „ p!ck out .be right 

advertisement and h^^^ The studem ^ 

Following eaeh artic^ requests the 

arice. Still another test m 'n 
based on an mdex to a 

. vocABUUAnv Thisisa.es.nftcch*alt«^^^^^^^ 

deS^Efemwordsorphrasesmeans 


16. Duncan Baker. H? N- 
Parkway 

fal 5422 (b) 543-J 

jc) 54is (d) 4478 



GROUP STANDARDIZED TESTING 

and punch the circle for that word. You will have 3 minutes for Test 8. 
You ate not expected to finish in the time allowed. 


1. biography .... (a) a graph (b) a 

(c) a textbook (d) ^ life history 


i 2. narrative (a) tells a story (b) an argument 

j (c) a scientific report (d) a law 

|1 Directions for Test 9. In Test 9 there are nineteen sentence, each 

[i followed by four words. One of the four words could be used in place 

! of the last word in the sentence without changing the meaning of the 

|i sentence. Decide which word could be used in place of the last wor 
in the sentence, and punch the circle for that word. You will ha\e 
i 3 minutes for Test 9. You are not expected to finish in the time 
1 allowed. 

I — — ; ; 

; 3. During the recent typhoon the waves were titanic. 

I (a) choppy (b) unusual (c) artistic (d) gigantic 


4. During the trial all of his statements were substantiated. 

(a) disproved (b) impudent (c) verified (d) forgotten 


Directions for Test 10. This is a test of general vocabulary. You are 
to decide which of four words means the same as the first word, and 
punch the circle for that word. You will have 3 minutes for Test 10. 
You are not expected to finish in the time allowed. 


[] 6. hinder (a) rear (b) risk (c) prevent (d) persuade 

i | 7. imperial (a) cruel (b) royal (c) governor 

'j (d) colonial 


Different reading tests stress different facets of reading skills and are 
represented in various tests in varying proportions. The test user should 
examine the actual test items in order to determine what skills the test is 
really measuring — not only with reading tests but with all achievement tests 
or batteries. Only the test consumer, after critically examining the test items, 
can gauge the validity of the achie\'ement test for his own objectives. The 
same general principles that guide the use of reading survey tests are also 
generally relevant for other types of survey tests. 

The survey test, then, is a measure of the skills and knowledge that com- 
prise a given subject. Certain areas are represented within each test. Emphasis 
is sometimes directed in slightly different ways within each test. The test 
should be chosen that best serves the particular purposes and needs. 



283 


AailEVEMENT TESTS 

Diagnostic Achievement Tests 

The diap-ostic achievement teat attempts m measure in 
strengths and weaknesses m a speci c diagnostic 

which attempts to measuteoveraU^-^-^^^^^ ^ ,^^^^ar’s 

achievement tests to be familiar . t academic problems 

time is spent in diagnosing the work „ do rte same 

of her students. The diagnostic achievement test attempts 

thing. . isoects of a child’s educational 

The diagnostic test indicates that Priscilla, iiho is 

development. For example, » „al to children beginning the 

in sixth grade, has an reveal that Priscilla tends to make 

third grade. Diagnostic reading t«s ^atituting .cat for tare. Her word 

reversal errors in reading, for ^cognize short vowels and to apply 

recognition is fairly good. Her ) findings, along with other 

known consonant help in planning concrete edueanonal 

classroom and standardized test data, ne p 

treatment. , ^ j,, extension of what most leac ers 

The diagnostic >=l'icvenient test is an diagnostic test presents 

attempt to do in their ''='>'‘‘’>'.f”“ “'r,„„i,ies lor students to reveal 
exercises and problems that P">'"J' „.|,h a systematic technique 

S and work habits. It examine some of the unique 

for evaluating these skills and tabu . 

features of the reading and arithmetic 

subtests yie ding score ’’’1^1 he diapiostic oral reading 

arc simtlar to the s ) Another appro^t^h is th , _r *},:c technique* 

ized paragraphs an required j-ed. the examiner 

the signal. ^ ''“f ‘,„Jisv.hollyorpa'<i>'Jf™'P|,;„„s arc recorded, inie 

*rt“i:i:."omi«edivo^.suW^^^^^^ 

student IS allowed to c ,ble because it enables the 

two PO"8"P,'?%„f,„ndenfsoial reading isjal „ b,,p, ,he 

The recording of a s „f the 

:e“-;::^^S5sSSnosde-^^ 

mrSS^r%S-2^Si;„aeh.The.^^^^^^ 

manner rather than in the 



group standardized testing 
loose and the teacher proceeds to some 

children’s receptivity to the tasks. For example, the d.rectmns for time state 
“Allow ample time for all the children to finish each subtest. ... the 
examiner should determine how much of the test is to be given at any one 
time. It will depend on the age of the children and their attention span 
(p. 7). There are eleven subtests, each measuring a different reading skill, 
as follows: 


\ Unit 1. Letter Recognition 

! Unit 2. Beginning Sounds 

j Unit 3. ^^^lole Word Recognition 

Unit 4. Words Within Words 
Unit 5. Speech Consonants 
I Unit 6. Ending Sounds 

Unit 7. Blending 
Unit 8. Rhyming 
i Unit 9. Vowels 

1 Unit 10. Sight Words 

j Unit 11. Discriminate Guessing 

Doren (1956) in her discussion of norms presents the basic approach to 
the interpretation of diagnostic tests:** 

In contrast to an achievement test, a diagnostic test is not ad* 

5 ministered with a fixed norm as its attainment goal. Individuals »n 
need of diagnosis have been classified by previous comparison with 
' a norm, or by teacher observation, as in need of more thorough 
examination. In an achievement test, the number of correct responses 
■ is the measure of the degree of success. In a diagnostic test, it is the 
mistakes which an individual makes that will indicate his areas of 
need, and an exact identification of the types of error will direct the 
examiner to specific remedial work. 

... If an examiner assumes that a child is normal because he 
attains a norm and looks no further for specific areas of possible 
improvement, no diagnostic purpose is ser\'ed by the test. Attention 
I must be directed to the exact nature of the wrong answers. It is mis- 
leading to a teacher to provide a norm for a diagnostic test because it 
j can serve no useful purpose. It is of utmost importance that a teacher 
^ understands that she must observe the mistakes and do something 
1 to correct them. The teacher must study the individual test papers, 

j and make adjustment in her program to correct the deficiencies [p. 17]. 


"Marsaret Doren, Doren Diagnostu Reading Teit of Word Recognition ShUs: 
Manual 0 / /«jrn/fricmi tn'rA Suggestions for Remedial Activities. Copynght 1956, 
American Guidance Ser\-ice, Inc. Uaed by permission of tJie publisher. 



iiiiiiiiimi 



285 




286 


GROUP STANDARDIZED TESTING 


Arithmetic arithmetic skills is the Buswell- 

A good example of *<= Arithmetic. The student 

John Diagnostic Test for Fundomen subtraction, multiplication, 

is presented with a series of problems n add ^ 

Tht — 1“““ 

examples similar to those listed arc 


Neglecimg to carry 


BorrovAng Error 


Example: 


Example: 


233 

695 

828 

648 

74 

674 


The manual points out that the teacher should not be satisfied with simp y 
making a diagnosis and checking the items. She should also study the p p 
work habits, that is, how he attacks the problem and what processes he . 
and gear her teaching accordingly. This approach is basic to use “ ' 

diagnostic test. Diagnostic tests are given only to children who need sp 
attention because of scholastic difficulties. They are clinical tools and sho 
only be used in that manner. That is, their statistical validation is open to 
great deal of error and cannot be thought of as having the same degree 
validity or reliability that other instruments, such as the sup.’ey and 
achievement battery tests, have. Thus, the clinical or subjective approve 
indicated in using these tests as aids in learning rather than as dennitiv 
assessments of academic progress. 


Major Uses in the School 

Our discussion in this section will center on the primary uses of achie'C 
ment tests in the school setting. It should be noted that only the major uses 
will be discussed in practical school terms. 

Understanding the Student 

To guide a youngster properly in his educational planning an 
standing of his level of achievement is extremely important. The seno 
must know his educational achievement to plan for future academic goa . 



287 


ACHIEVEMENT TESTS 

possible remedial assistance, and eventually his or her vocation. In order to 
present the important role of achlerement testing in understanding the 
individual pupil, let ns listen in to an actual counseling situation. Mr. Smith, 
the junior high school counselor, is talking to Bob Fein, an eighth-grade 
student rvho is concerned about his future high school program. 


Mr. Smith : Well, Bob, how can I help you today i 
BOBI I’m not sure what courses I should take m high school, you 
know, whether I should start with algchra or not and so forth. 

Mr. Smith: Well, how do you feel shout it? 

Bob- I don't know. My problem .s this: you told me that I wiw 
above average on my aptinidc tests in arithmetic and science, but 

V have yon done in your class workl 
Bon: Oh, just fair. 

Mr. Smith; I see. Just fair? , . q 

Bob: What I mean by just fair ,^5; 

Mr. Smith: Well, 'u 

scores and school grades. It j average, and the 

in most areas. However, your class ,„el in 
achievement tests show tha y ^ have the 

reading, arithmetic, and retenc. Thus^iUo^^ 
ability to learn, but you ha as much 

Bob: Are you saying. Mr- 

as I should have by this time? you think. 

"”mr ‘sMi™*V»;tu he. on the other hand, there conid he 
BosTguess so, bn. what should I do as far as my high school 
'’Tr“'sm°™: The final ““Tc of help. Fimt, there is 

prevent you from " p„, off algchra and 

re«" >”'• 

ciencies by taking gcn 



GROUP STANDARDIZED TESTING 

's”nT7'4rimpa^^^^^^^ Bob, is that you know that you 
how will you feel then? 

MR.'s^uTH;Of^ course not. Vou see, Bob, abili^ and 
are, many times, quite different. 1 am sure you know of ^ 
codd mLe outstanding baseball players, but never P«f°™ 
because they haven’t learned the basic skdls m playmg ‘Be ga™. 1 
you take a ^y who might be another Babe Ruth and P^tBnu m Be 
Ljor leagues before he is ready he may never realise B.a potent^.^^ 
On the other hand, if you put him in the minor le^ues and gi 
the experience to learn, he may beeome another Babe B"*. *n ' 
same way, if you have the ability to do well in algebra and biolo^ 
but lack the learned skills, you need some experience m the mine 
leagues— that is, general math and general science— before you co 
up to the big leagues. Then you may become a major league pe ' 
former. Do you understand what I mean? 

Bob: I think I do. In other words, you are saying that by trying 
a subjea that I am not ready for, I may fail it. 

Mr. Smith: Yes, not only is there a good chance of your 
but more important, your failing or having extreme difficult) i ^ 

I passing may close that door of learning to you forever. That s one o 
j the reasons we give achievement tests. We want to know how rea y 
i you are to learn new subjects based on your past learning, 
j Bob: Thank you, Mr. Smith, I will tell my parents what you said. 

The case of Bob Fein is, of course, only one example of the many uses of 
achievement tests in helping to understand and guide students. This exarnp e 
reveals the practical importance of the use of achievement test results along 
with other data in educational guidance. 


Identification of Children for Intensive Study 

Achievement tests help spotlight those youngsters who need special 
attention. It is true, of course, that every child should be studied as an 
individual, and most of our schools attempt to do this. However, every seno 
system has those pupils who need special assistance more than their peers. 
They arc often difficult to distinguish from the others because they do no 
seek special help, nor do their teachers or parents often know of thei 
problems. 



ACHIEVEMENT TESTS 

One way of finding out who these children are is to administer an achieve- 
ment batter}'. , . 

The child who needs intensive study may show great differences in his 
performance on different subtests, or he may perform far below Ms grade or 
age level. Sometimes his performance, as in the case of Bob Fern, i^ar below 
his capabilities as shown by a group scholastic aptitude test. The school 
counselor or teacher, in studying a studenfs test performance, thinks of the 
following questions: 

1. Is the studenfs achievement related to his aptitude! That is, is 
the child falling behind the level tve evpcct of himf 

2. he have ! reading problemMf so, do we h- J" 'utoion 
of the child's academic aptitude based on “ 

3 Dues he have a problem in some specific school subject, 
ansucr is jes, further study and resting may be needed. 

In order to convey the practical ihe^uS 

at an actual case that I encountered because of marked 

schools. A fiftecn-ycar-old boy school record. The 

differences in his various test aotitude tests, such as mechanical 

boy had e.xcellent scores in His other apti^de 

reasoning, abstract reasoning. P . lest shovved an IQ 

scores were extremely imv His 

the level of mental f"”'**'"’"'. ^ ' ,„dies, an extremely high score m 

ces, such as a very low score m extremely high score m 

mathematics, an average score m L g . 

'IXided that an individual intelligent. 


'TSiided 

test were needed. ..... .,_j.,.ase 

superior rangeinintellecti^Jfi^^^^^^^ 


jperiorraogeinintellectualability^nd^'B administered the boy sms 

t reading achievement. Af" '’’“Vfnnhr, xtudy. During the coning 
referred to a school social f„,s of the boy's life nw 

sessions with the social worker, cetui physicians, had bee 

It developed that this boy s ’ (he boy was twelve years of ag . 

killed in an automobile accident ' „ok. 

since that time he had been fbistestperformancest e oy 

men confronted 'V* "^lin team because "I d.dn . feel V 

he had not tried to do his ^ bilitics were affecting 

meant anything an>'way. ,be boy * ^ j asked 

I now knew that ettnumstances ^ ^ “"*'‘‘“'‘1 ' „„l impor- 

his performance both in t e to^and the loss of his 

myself certain questions. the auto “““I'" .„jstions had to 

taut!" “Is there a t'l«""'*;l’ „ .S'' These and other quBt. , 

parents to his scattered P«f“®“tnid be made in counseling 
be answered before any progteB 



GROUP STANDARDIZED TESTING 

290 .or 

therefore decided to administer some personality tests. (See C apter 
explanation of projective and a "burnt child 

push orsCe L the auto accident and the ensu.ng He no'' Wt 

that his goal in life 'vas to replace his parents by ^ reality 

This of course, 'vas understandable. However, the child lo g ,. 
in th nking that he could become a doctor without passing courses not dirert J 
relamlm *e sciences. This is why he did not work in social studies a^ did 
not attempt to do well on achievement tests outside the 

Through the use of tests and interviews a clearer picture was obt 
thi: b;® problems, which were inteHering with 
perform in school. In counseling sessions he was helped t 
Realistically about his goals and the means of obtaining them. W « a^^ 
example of a boy whose vocational goals w-ete m line mth his 
who \vas unable to harness his abilities m all school subjects beca 

emotional problems. .... . ..kUeirpn 

Of course, the above case is rather rare in the schools, for most chj 
do not experience the shock of losing both parents. However, it is not rar 
for a child to be placed in a grade or level below his abilities because of p 
achievement. Quite often this inability to use his potential is due to emotion 
problems. A careful comparison of the child’s achievement test scores tv» ^ 
his scores on aptitude and intelligence tests can help to identify those te^ 
youngsters who need intensive study and help. 


Teacher Aids in Program Planning 

At the beginning of each school year the classroom teacher is usuallj 
given a general outline of subject material to be covered along trith 
tional objectives. Decisions must be made concerning various subject fie s 
and bow much time should be given to a review of the previous year s v,or 
In addition, the teacher is faced with the problem of planning independent 
work for those children capable of going beyond the regular classroom couRC 
of study. Most teachers want to form groupings within the class so t a 
students of similar abilities and skills can work together at a common Ie\e • 
For example, the first-grade teacher may want eventually to group 
children in reading in three or four sections. Section 1 might be the top 
group section 2 might be the average, and section three might inclu 
students below average in reading. Section 4, if needed, might consist o 
those children who are not yet ready to leam to read. 

In order to carry out these educational plans and goals, the teacher nee s 
to know her students’ abilities, skills, and past learning achievements as soon 
as possible. One way to do this is to administer achievement tests (surv’C) , 



291 


ACHIEVEMENT TESTS 
diagnostic, or battery). The results of the tests trill give the teacher an indica- 
tion of the relative achievement level of her group of children— that is, 
whether the group is superior, average, or below average m the basic skills 
she will attempt to develop. The scores trill provide eluis to the group s 
strengths and weaknesses. The teacher can then adapt her plans to the group 

as a whole and to the individuals within the group. r„-v-„.h„ 

Of course tests themselves ate only one indication or clue for the teacher. 
The good te diet in addition to using the results of tests, obtains certain 

LprS^Jonl of her^tudents by contact trith them. 

of an indk-idnal child cams only /Aroigti ccack^S ■ 

however, enable the teacher to have an objectne frame 
addition to her subjective esttmalton of the group 

Planning and Ecniualing Schootnide Pcograms 

In order ,0 provide the best possible — ,__a achoo, m.t riiray^ 
examine and re-cxaminc its curncu gy^uation of the curriculum, 

achievement battery is used as . „„fh(,wwellaschoolorschool 

The results of the tests providesomeii^d^^ 

system is doing in relationship students’ test scores compared 

Junior High School wanted to find out ho Roosevelt therefore 

with the scores of other children >'’'““8''°“' ^ them to the national averages 
averaged its children’s >=*< Sa mid social studies Roosevelt s 
provided by rhe test publisher. '“"“h“otional standard or average, 
average student was one grade level national average. 

In mlt^hematics and English its studen Uvvere 
Roosevelt’s teachers were thus enabled 

grams of instruction. . this method for evaluating 

There are, of course, certain dang ' 8^^ does not ^ 

school’s course of study. Firs, of 

complete picture. The ‘“'X , snJall part of the obj«.n« 

skills it covers. Usually th«e s always present whether 

of our modern school- A danger ^ ^ive tests and then s a i g 

.ion is the relative simpl city of 

or not the eurriculum is “P.!“Xr.S result. 

methods of evaluation in a » ’ . ptogram- , before 

■eeper insights into ‘he problem of local be unfair to 

One must also remcmb ;„ctnictional program. 1 ^ below 


and oDjcvii'^ ^ 

“‘3:; xtraa^ts:: 

critically evaluating ^ adequate because th .jpphasis upon 

.ay thafa particular school ^ a^eq Son Ld 

the national average. A of its part ubr ^ 

:ertam subjects and delay school may ^^'1. If ,.ch a 

jducational philosophy. For . jban factual 

iinderstandin^ of subject ma 



GROUP STANDARDIZED TESTING 

292 

school uses an 

students may do difficX 'v-ith national achievement tests 

battery in the light of its own particular goals and instructional prop • 

^ On? must als? take into account the geographic location of a =*0 
school system. Schools and communities differ in 
levels Related to these differences are the ranges of abilities 
the public schools. These factors must be taken into account «*ie 

ment is considered. These differences may be lessened ^P^ 
norms for the school or school system. For example, students who a tend 
schools in the north suburban areas of Chicago generally come f™”" P"™^? . 
homes. That is, they come from affluent families, who ® 

cultural level and who are able to provide their chddren with «P"' 
that many American families cannot afford, such as visiting the many rnuse^ 
in Chicago, attending plays for children, and attending the chddren s 
concerts of the Chicago Symphony Orchestra. In this kind of area t 
average student does better than the average student in other areas oi 
country. If the local schools in this area applied only national averages, tn y 
would not have a complete picture of their instructional prograrns. hut oy 
establishing local averages they can get a picture of how well the child is doing 
when compared with his fellow students, who share similar backgrounos 
and experiences. For the same reasons, a disadvantaged area in Chicago, or 
elsewhere in the country, may want local averages, as well as national averages, 
to obtain a more complete picture. By using local averages the disadvantap 
area can tell the relative progress ofitsstudents.whostartoutwith somuchless 
than the average national student in ahilities, motivations, and experiences. 


Grouping 

To group or not to group, is that the question? How do you feel about this 
question that has concerned educators for close to sixty years ? The band- 
\s-agon for and against grouping has changed directions many times m those 
years. Since Sputnik (about 1957) the educational bandwagon has been 
directed to\vard grouping children according to abilities and achievement. 
The last few years, however, have seen some attempts to redirect the wagon 
in the other direction. Our task here is not to affirm or negate educationa 
grouping, but to present the uses of standardized achievement tests, m 
grouping, if you as educators decide on that course of action.^- 

IS I am personally very much for grouping because I see it as an instructional 
technique in helping children learn. Even when grouping is practiced, there is sti 
a range of abilities and achievement in the classroom. It does, however, narrow tne 
range so that the teacher can gear her instruction to most of the children. 



293 


ACHIEVEMENT TESTS 

and third, a record of classroom , „f „i,tse, that children 

The use of these "Jq '.'! F aS," or achiesement test scores, 

arc not grouped according to IQ, g based on the students 

but according to specific and particu ar example, docs 

rvhole scholastic potential and ach|«me„t .^jXd rira.hema.ici course, 
not mean a student should be p .,,. 1 , (jig potential in order for him 

His achievement in mathematics must match his poten 
to succeed in advanced mathematics^ „^„,i„g a standardized picture 
Achievement tests help in ^cher bias in academic 

of a student’s academic progres.. ' . (j,e gple basis for grouping 

evaluation. They do not, however, provide 
decisions. 


«««"■'* . I,.,!,!, aid in educational research. 

ThestandardiaedachievemenUmt^aU ^ „ 

For example, Hyde Senior High Scho gpp„ach. Two 

more suceessful than the Mgmry students are 

groups of eleventh-grade United Suw onc FOdP 

history achievement test in the bepnn^ 8^ "!*' mV^mn 

along the classical P=‘"™ 'j „f the same test 8>™"- Pj,gi„s 

end of the term a / . „ade. If the ®i!i„„^s in this 

between the group „ight say •‘“’/“'J'J.t'J’pnc teacher, one 

higher achievement scotK on^ g^ if he « 

instructional method is higher, one might have som 

class approach group leaching-^ f,„r/iers Yes, some schools 

about this use of acni 

should he 

=sg;isgfef‘’gg’HF.s 



as fi good ’ 



group standardized testing 
294 

Obviously, .his procedure has mp- •.;^''’:J2rrbih.; 

achievement of a class .s ^ 'e“ fair ,o hold 

Secondly as has bcL staled, achievement is based rn part upon apt tude 
“I; experiences not gained in school. Thirdly, an =ch-ve-nM“' 
can measure the success of only a small portion of the go 

On the other hand, if tve evaluate teachers collectively 8™“^ 

rather than as individuals, the achievement .«t 
That is, in research investigations of teacher cffectucnes 
as definitive guides for the continued employment of a teacher, 
school that USK the achievement test for teaeher evaluation do« 
truly understand the limitations of this form of testing and evaluation, l 
the school administration had insight into these limitations, it 
run the risk of unhappy teachers teaching “for the tests rather tha 

the students. , • ♦ r#- nnlv 

Remmber, the achievement surv'cy, battery, or diagnostic test are ouy 
instruments to be used in assessing the whole child. They have no in ere 
power to provide complete answers by themselves, but when used n 
other evaluative instruments and the school record they provide valua 
insights into the academic achievement of students. In the same way "C ca 
use them to gain insights into the school's national standing and as a 
evaluative tool in curriculum assessment as long as we remember they are 
only one part of the whole academic pirture. Let us end our discussion o 
achievement instruments with the following summary statement concerning 
achievement tests, from Educational Testing Ser\'ice (1961), one of t e 
nation’s largest test publishers. 


There is no single achievement test or test battery that wil ® 
i “best” for all pupil populations, all curriculum objectives, all purposes, 

: and all uses. Even tests universally recognized as “good’ are not 

' equally “good” for different school settings, situations, and circum 

' stances. 

The recommended procedure for achievement test selection tn 
I consists of three phases; 

■' 1. study of our oi%-n school characteristics and testing needs; 

t 2. analysis of characteristics and capabilities of available tests; 

, 3. matching (a) the population of norm groups, or reliability samples* 

j and of validity’ studies for each test to our own Dupil population, 

' (b) the content of each test to our own curriculum content an 

i objectives, (c) the validity evidence, reliability data, scoring system, 
and interpretive material for each test to our own testing purposes 
j and prospective uses of test scores [pp. 32-33]. 



29S 


achevement tests 

References 

BusM-ell G. T. Uamalfor the SRA Readitig Saord. (2nd ed.) Chicago; Science 

d" Tee, ef Weed Reeeg^ee " 

Snete teith eeggeetien. /or remediel eetee.,,,,. M.nncapoh,. Amencan 

E5u:r£XiilSch.cvc™„,.».s^acoa^ir^.^ 

(Ed.), Reading, in ptyehehgtenl ueu end mnsurementt. (Rev. ed.) n 
III.; The Dorsey Press, 1968. 

Ed'uaLnal Testing Service. SekCi^ Prinaton, 

duret. (Number 3, Evaluation and advisory service 

N.J.; Educational Testing Semce. 1961. Hmdioak. Princeton, 

Educational Testing Service. C«fer«n« ftioiory 2e.r. 

N.J.: Educational Testing Service, I%7. j, ^ sMaehme- 

Kelley, T, L., Madden, R., ““'‘^".Ilw.'/wermediiKe I I” "" 

men, lesl—dirtemns for World, 1964. 

middle of grade 5. New York; Hareourt, ' j,, , sj„,, Foresman, 1967. 

Lindeman, R. H. Eduerrlmol New York: McGraw-Hill, 

Nunnally, J. C. Ednealmal mearuremeni rmdevema 



CHAPTER 

tl 

College Entrance 
Examinations ^ 


The American college today is faced \rith the problem of accommodating 
large numbers of young people who want a college education. The history 
of higher education has been different from that of the public schools. 
Americans have prided themselves on the fact that public schools were for 
all children no matter what their academic potential or achievement. Colleges 
and universities, on the other hand, have not attempted to educate all the 
people; they have accepted only those with the academic abilities to profit 
from a higher education. Therefore, colleges have always been faced wlb 
the problem of selective admissions. As a consequence colleges have been 
using standardized tests and other selective criteria even before the number 
of college applicants became so large. 

Colleges differ as to the testing instruments they use for selection. Some 
administer their own tests, others prefer to use the services of the College 
Entrance Examination Board (CEEB) or the American College Testing 
Program (ACT). The basic difference between the Scholastic Aptitude Test 
of the CEEB and the ACT lies in the grouping of items and in the scores 
provided. The ACT provides tests organized into sections and produc^ 
scorw more nearly related to the traditional disciplines, whereas the Scholastic 
Aptitude Test is organized in a verbal-quantitative manner (Juola, 1961). 
296 


COLLEGE ENTRANCE EXAMINATIONS 


297 

Most of our discussion in this chapter trill be devoted to the CEEB and 
ACT, which represent the greatest pe^ntage ■>' “d™™ ^ 

United States, Let us first briefly revtew some of the other tests u.eo y 
colleges and administered at the local level- 


College-Administered Tests 
Among the most prominent 

CEEB and ACT programs are the .^jCollegeAbdityTestsof 

Tests of thePsyehologleal Corporauon the^^J^a^^^^ Test, 

the Educational Tesung Service, .j,he College ClassificationTests 

Form 21 of the Ohio College “ L In terms of lAat 

of Science Research Associates. A b^ u„j„rsity Psychological Test 
they attempt to ? „„ ,:„e limit. The School and College 

differs from the others in that there is n mentioned m 

Ability Tests is the same j’ generally found on roost of the 

Chapter 9. To illustrate the Qu,! "fieation Tests. 

pre«ding tests, let us look multipurpose battery which 

^ The College Quahhcation T«ts (CQ ) placement, and counseling, 
serves as a blsis (partial) 

It is also used in some scholarship award prog 

• ■ hwlcallv a vocabulary test 'Vitn 

1 , CQT.VM This.s«,onwtapj^_^„„eerbal aptitude, 

areas of JP economics, , section yields nvo 

EuTjrsTuaentyerwmlhaa^^^^^^^^ 

subseores, Science and Social 

score. , L thirteen- The 

The level of the test ^ ^ 

different degree programs. 

College Entrance ExaininaH®'* .wtion testing P™?"'” 

One of the oldest and be^Jn^ari (CE®)- ’’ 

is the College Entrance Examm 



298 


GROUP STANDARDIZED TESTING 


association of 579 colleges (as of 1966) teith additional reprMentatives from 
secondati- schools. Chaoncey' and Dobbin (1966) in the.r d'S^^ston of the 
history’ of testing state the basic reason for the creation of the Cbbli. 


The College Board examination program was started at the turn 
of the centur}' as the result of a proposal that colleges requiring 
examinations for admission would do both the high schools and the 
applicants a sen’ice by setting a common examination on which an 
applicant could earn admission to any of a number of colleges. UntJ 
the College Board was formed, a student who \\-antcd to enter his 
application at three colleges had to take three different examinations 
at three different times and places. The principal of any high school 
that had many college-going graduates had an e.xasperating time 
trjnng to arrange for and comply with the multitude of examinations 
his seniors needed to take . . . not to mention preparing the students 
for the examinations [p. 17]. 


The first tests consisted of essay questions. In 1926 an objective test 
called the Scholastic Aptitude Test (S.ATT) was used for the first time. The 
use of the S.AT increased slowly at first, but as more schools began to require 
it for admission and as the number of college applicants increased, the SAT 
became a “household name” to aspiring college students. In 1966 over 800 
colleges and 250 scholarship programs required students to take the SAT as 
part of the admissions process. In the years since 1926 thousands of research 
studies have been conduaed by these and other educational institutions, 
as well as by private investigators, to ascertain the predictive validity of the 
SAT (College Entrance Examination Board, 1966c). 

The CEEB is a nonprofit membership association of colleges, secondaiy 
schools, and education^ organizations. The CEEB program of tests consists 
of the Preliminary' Scholastic Aptitude Test (PSAT), the Scholastic Aptitude 
Test (S-\T), and the Achievement Tests. The CEEB will make special 
arrangements for administering these tests to handicapped students (College 
Entrance Examirution Board, 1966a). 

The CEEB has its tests administered at testing centers throughout the 
world. These centers are usually located in a high school or college. The 
tests are administered by local qualified personnel either at the high school 
or college level. There is strict adherence to standardized procedures. The 
administrator is even required to call Princeton or Berkeley, long distance, 
for certain types of irregularities. 

Figure 30 shows the extensive coverage of the CEEB and the location 
and addresses of the two service centers. 

The centers administer the test and send them back to the CEEB head- 
quarters in their geographic region for scoring. The results are then sent to 


* President of Educational Testing SenHce, publishers of the CEEB tests. 




^PMILLIPiNtS 


PACifiC liUNDS 



/ 

f M.ic« 

ly ' 

5£SSSS3HiSifc.-'-:= 

. Board.) 


299 



300 


GROUP STANDARDIZED TESTING 

the student nnd his secondar>- school as well as to the »lleg,^ 

applying for admission. Score reports arc sent free of charge to the 

eoUeges of the applicant’s choice. There is a nominal charge for each add.t.onal 

™'r«l Da!« and Schedule. Although dates and number of administrations 
of the College Board may vary, the usual number is five times during a 
calendar year. These are usually on Saturday during the months of December 
January, March, May, and July. The Scholastic Aptitude Test is administered 
first at 8:30 a.m. and concludes approximately at 12:30 P.M. The Achievemen 
Tests and Writing Sample are begun at 1:30 P.M. and usually finish at a 
P.M. 


The Preliminary Scholastic Aptitude Test 

The Preliminar}' Scholastic Aptitude Test (PSAT) is basically a shorter 
version of the Scholastic Aptitude Test. The PSAT is given to students m 
their junior year of high school- The College Board investigated students 
academic performance in college based on three years of high school, an 
found that by the time a child is a junior in high school his performance in 
college can be predicted by his three years of grades and an aptitude test 
almost as well as by his senior year grades and test scores. Because of this 
finding, the College Board decided to offer the PSAT to encourage earlier 
college guidance of a higher level. Thus the College Board recommends the 
PSAT for juniors. Colleges that are College Board members will consider 
the PSAT scores in early counseling and in giving adnee to prospective 
students concerning their chance of acceptance and success. 

The PSAT is a uvo-hour examination. There are two scores, verbal and 
mathematical. The scores are reported on a scale ranging from 20 to 80. 

Some educators may wonder if the PS.AT is really necessary’. You may be 
saying to yourself, “Look here, by the time a student is in his junior year the 
school already has information useful for counseling him about college plans- 
1 have also read that a pupil’s performance in college preparatory' courses is 
the most important indicator of college success or failure. And besides all 
of the above, what about those standardiaed tests you have been talking 
about? Isn’t that enough testing? Why b the PSAT needed or helpful 
anj't^’ay?” 

The PS.AT may be helpful in several ways. For instance, the child who 
gets all .Vs at one high school may be only a C student at another school. 
The PS.\T helps solve the problem of different standards among high schools 
by giving as a national yardstidc that can be used to measure a student’s 
ability as compared to other boys and girls throughout the nation. In addition, 
the school counselor will have information concerning the required scores 
needed for admission on this test by \'arious colleges and universities. 
therefore, can help students in early planning for college in a realistic an 
meaningful manner. 



COLLEGE ENTRANCE EXAMINATIONS 301 

It is not OTse, however, to assume that a student's PSA'T score is the only 
factor in predicting college success nr failure Suppose, for example that 
John scores below 30 on the PSAT and « thinking of ^ing to 
academic record is good, and he seems to be motivated to work ,n school. 
In addition to these factors his family can offer him strong financial “ ^ '™'“’ 
support. If this is the case he can probably still pm admission “ . 

3“"; counselor or teacher's support and guidance, a college is earefullj 

selected. . . , .u. pcaT he mil be given his scores and a 

After a student has taken the . PreUminarv Scholastic 

l;rde'¥^L” ^rbookL'Intai: in— 
":;uunt;srnVs3rsm^^^^^^^^^^ 

Some Technical Details ..^ yo to SO 

The score range of the PS.AT, k 200 to 

and is equivalent to the ^‘^‘’'^^^.pgsq'gcore reveals what the students 

800. The placement of a aero after Ac PSA'T j „( ,he 

...your PS.AT scores »"*' ^ Them fein f 'cL^P'O”- 

differcnt from your "^'^eoior-year-SAT-verbal score 

mately 1 chance in 5 that f™' ^ piedicted SAT-ve.bal 

will be SO points or more >"8!“' '“Vchance in 5 that it will, be 
score, and there is also ,„ie tor your mathematical 

at least 50 points lower. The sam 

score [p. IZ]. . , 

I rsric Itself to comparisons ot 

lates in different years. It s „ Thisisbecauseajurnorw o 

,e considered in these compariso ^ ^ ^ l^yesenting the 

he same score as a senior y ^ interpreted as r P g^gnt 

The PSAT, like any other test, error of nf 

rxact college potential of sections is 20 is 

■or both the verbal and .he Kuder-Richardso^F^^^^^^^^^^^^ 

The reliability coefficient “ 4 fordiscussionofreliabi y 

3.88 for both sections. (See Chapter 

•rror of measurement.) 

The Scholastic Aptimde Test r ,i,e College Entrance EHnnn»tion 

The SAT is the oldest exa— Like the PSAT d 

Board. This test measures 



302 


GROUP STANDARDIZED TESTING 


uvo sections verbal and mathematical. It is a test of ability, not of factual 
knowledge The verbal section emphasizes ability to read with understanding 
Ld to reason with words. The reading material consists of passag^ fmrn 
such academic fields as the humanities, social science, and 
mathematical section measures the individud's aptitude in =‘’'''"8^™;, 
and stresses mathematical reasoning rather than factual recall of g 

Verbal and mathematical talents are, of course, related to college s«cc^ • 
Usually the scores of any individual on the SAT are closely related to 
success as a student. One usually finds, therefore, that a student "™ 
good academic record will score high on the SAT, whereas the studen w 
has performed poorly in school is very likely to receive a low- score. Of course, 
there are exceptions to ever)' rule; the SAT is far from being a per ect pr 
dictive yardstick of collegiate success. ^ 

The administration time for the SAT is three hours. All items are presen e 


in an objective format. 

Scores on the SAT, as you remember, arc reported in numbers ranging 
from 200 to 800. About nvo thirds of the students taking the SAT score 
somewhere between 400 and 600. In terms of a general grading sjstem, 
in which zero is at the bottom and 100 is a perfect score, tve can state that 
200 is equal to zero, and 800 is equal to 100. Therefore, an individual canno 
receive a score below 200 or one above 800. . . 

The standard error of measurement for both the verbal and mathematica 
sections is approximately thirty points. The reliability coefficient 
the Kudcr-Richardson Formula 20 is 0.90 for the verbal section and 0. 


for the mathematical seaion. 


Wiiv Take the S.AT? , 

Not all students should take the SAT. The student who docs is intereste 
in obtaining admission to a college that requires the S.AT as one of t e 
criteria for selection. It helps these schools in evaluating applicants from a 
over the world by providing them with a standard frame of reference to 
judge a variety of college applicants. It helps the student in evaluating the 
college by comparing his scores -vriih the average student at the college o 
his choice. 

Can Students Study for the SAT? 

The best answer to give to the question of whether one can study for the 
SAT would be to say yes, if the pupil begins in the first grade of schoo . 
l*hough this may sound like an c\'asion of the question, it is a most honwt 
and logical answer, because school skills arc the direct result of practice 
and instruction over a long period of time. You cannot expect Johnny to 
study for a few weeks or even months and acquire the skills that he shoul 
luve learned years before. No one believes that one can learn to play 
trumpet or sing or become a major-lcaguc baseball player overnight. A 



COLLEGE ENTRANCE EXAMINATIONS 


303 

these skills require veers of prsctiee end development By the seme token 
one cannot Jrn to'read or reason logically in a few hard sess.ons of 

enough to make it worth wbde. (See Chapttr 3 lor Mn ^ g 

this problem.) Thus, the best pr^ra i shout his environment 

wurk in earnest, read wddely, and^o^^ 3 j 

throughout his school je. cramming and come to the 

Examinations state that the stude ^ booklet 

test .veil rested. Each student L the student 

describing the test. In “ carefully^that he will understand the 

to the actual testing situation. 


S.W 1 P 1 .E Questions 7“' f „( ,he SAT are divided into six 

The verbal and roalhematieal sections ^ s,i„„s become more 

half-hour segments. As the tes P^ ,he verbal and 

difficult. The following l ems are stmtlar 

mathematical sections of the bA . 

VERBAL SECTION 

AiMiyms (Oppctiui) 3 „.„,d printed in capital 

FsSFEr-=""“- 

I^oo^tion: (A) slight (^‘‘Ltm.Itement 

(C) accurate representat. on J wholesome 

iimoNic: A) shgli. JJ 
fDl patient Ita) p 

' ' * . 1966-1967 edition ot 

■ The following 

nually and is rj-his boo^e* 1* thi College Entrance 94701. 




GROUP STANDARDIZED TESTING 


oemence . 

Directions : Each of the sentences below has one or more blank spacw, 
each blank indicating that a word has been omitted. Beneath t e 
sentence are five lettered words or sets of words. You are to choose 
the one word or set of words which, when inserted m the sentence, 
best fits in with the meaning of the sentence as a whole. 

High yields of food crops per acre accelerate the of soil 


nutrients. 

(A) depletion 

(B) erosion 

(C) cultivation 

(D) fertilization 

(E) conseiTi'ation 

From the first the islanders, despite an outward 
could to the ruthless occupying powers. 

(A) harmony . . . assist 

(B) enmity . . - embarrass 

(C) rebellion . . . foil 

(D) resistance . . . destroy 

(E) acquiescence . . . thwart 


did what they 


Analogies 

Directions: In each of the following questions, a related pair of words 
or phrases U followed by five lettered pairs of words or phrases. 
Select the lettered pair which best expresses a relationship similar to 
that expressed in the original pair. 
trigger:bullet:: 

(A) handlcrdrawcr 
(D) holster:gun 

(C) bulb:light 

(D) switch’.currcnt 

(E) pullcy:ropc 
Bicn'CLE:LOCOMOTio.N; : 

(A) canoe ‘.paddle 

(B) hcro:worship 

(C) hay; horse 

(D) 8pectaclcs:vision 

(E) statement; contention 

CENOIAL UN*DERSTANDINC 

'rhis type of question measures scholastic aptitude by testing the 
extent of information and the mastery of key concepts within a ver} 
broad domain of knowledge: the arts and sciences, current events, 
sports, and so forth. The emphasis is not on the ability mcrcl) 
to remember facts in any given field; a wide range of information 



COLLEGE ENTRANCE EXAMINATIONS 

n k required to receive a high seore on this group of questions, 

f Moreler, many of the questions test the abJ.ty to use 

in such a manner as to demonstrate an understanding m depth of a 
r natticular concept. Thus, it is simple enough, given the concept 
H Cfution to link it superheially to Darudn, but a better under- 
( standing of this concept is demonstrated by correctly completing 

[ tion to occur in the 

[ (A) digestive system 

i (B) circulatory system 

I (C) respiratory system 

[ (D) locomotor system 

[I (E) ner\’Ous system 

Reading Comprehensionivasn^^^5;;;~^r^^^ 

, ions, Inthis type of questmn,p«sag«fr»mpu^_^^,^ „„derstand.ng of the 

and questions follmv f «„„g time of the verbal section is 
material. Approximately half of the tes g 
devoted to this task. 

„ MATitEMATICAL SECnoN Student is presented 

[Note the folloiWng he problems. This eliminates, 

with this data to b''P . y f previous course work.) , y 

] The following information 

i the problems. 

\ Circle of radius r; 

Area ^ 7 ^ 7 *' 

Circumference =■ a circle is 360. 

The number of degr . y engle is ISO. 

The measure m degrees 

?;'’2?.7if:“.b.aog-e..beo 

^ AB X CD 

(1) areaof ■" 2 

( 2 ) + 



GROUP STANDARDIZED TESTING 



Definitions of symbols: 
< is less than 
> is greater than 
J_ is perpendicular to 


^ is less than or equal to 
^ is greater than or equal to 
11 is parallel to 


Examples: What is the weight of 28 feet of uniform wire if 154 fee 
weigh 11 pounds? 

(A) 2 lb. (B)^lb. (C)y!b. (D)71b. (E)141b. 

If 16 X 16 X 16 = 8 X 8 X P, then P = 

(A) 4 (B)8 (C)32 (D)48 (E) 64 


The town of Mason is located on Eagle Lake. The town of 
west of Mason. Sinclair is cast of Canton, but west of Mason.^ Dex 
is cast of Richmond, but west of Sinclair and Canton. Assuming t cy 
are all in the United States, which town is farthest tvest? 

(A) Mason (B) Dexter (C) Canton (D) Sinclair 
(E) Richmond 


If .t > 1, which of the following increase(s) as x increases? 



X 


' (x— x) 

III. 4r> -2x2 

(A) I only (B) II only (C) III only (D) I and III only 
(E) I, II, and HI 


The cost of electrical energy in a certain area is as follows 

C ents t>er kilotcalt-hour 

First 100 kilowatt-hours 3 

Second UK) kilowatt-hours 2.5 

'ITird UK) kilowatt-hours 2 

How many kilowatt-hours can one obtain for S5.00? 

(A) 175 kw-hr (B) 180 kw-hr (C) 20(J kw-hr (D) 225 kw-hr 
(E) 250 kw-hr 



COLLEGE ENTRANCE EXAMINATIONS 

[Directions and explanation for solving another type of mathemat- 

;rScSgrr:~ 

sufficient to answer J“^J°!Iv“^OGETHER are sufficient to 
C if BOTH statements ( ) an statement ALONE is 

answer the question asfccu, ou - 

sufficient; . ^ itself to answer the question 

D if EACH statement is sumcicn y 

asked; TOGETHER are NOT sufficient to 

and addnional da.a specific ,o .he 

problem arc needed. 

Example: 

What is the value of*. 

mpg = PK 

(2)j. = 40 



Explanation : jj pg = PR'. that 

- V«a'a“"' 

VhSoJlemcaiheao^=d,a^^ 

(2) The first three te 


308 


GROUP STANDARDIZED TESTING 


If a: is a whole number, is x a two-digit number . 

(1) X- is a three-dipt number- 

(2) lOx is a three-digit number. 


The preceding sample questions frcn, .he ^EB Jrookle. has. ^ 
selected from a touJ of thirty-three quest.ons, each “f 
explanation of the method of solution and the correct anstrer. Injaa 
to these thirty-three questions and discussions of each, fifty-set en 
questions am given tvith time limte which approximate the 
situation. This booklet is an invaluable aid to students preparing 
and is the best way to study for the examination. 


Achievement Tests 

The CoUege Board Achievement Tests are constructed by a committ^ 
appointed by the College Board which includes both college and sewnaao 
school teachers. These people work with measurement speci^ists 
Educational Testing Sen-ice. The development of the test follows 
procedure inherent in any proper standardized achievement test constru i 
(Sec Chapter 10.) , ^ 

The CoUege Board Achievement Tests consist of the follomng one- 
tests: 


1 American History and 
1 Social Studies 
,*i Biology 

\ Chemistry 

* English (imposition 

I European History and 

I World Cultures 

i French 

\ German 


Hebrc\v 

Latin 

Mathematics, Level I 
(Standard) 

Mathematics, Le\'el II 
(Intensive) 

Physics 

Russian 

Spanish 


Ten of the fourteen tests are administered at the regular testing d^tes, 
the remainder are given at less frequent inter\'als. In addition, seven additiona 
tests, called Supplementary Achievement Tests are available at no cost to 
secondary* schools. These tests are not given at the usual College Boar 
centers. They are usually given by the local school on the first Tuesday lO 
February of each year (College Entrance Examination Board, 1966 
These tests include the following:® 


r French Listening Russian Listening 

li Comprehension Comprehension 

* New tests are being des'cloped. These include a new two-hour test of composition 
and of literature as well as a new language test which will be included in the supple- 
mentary series. 



COLLEGE ENTOANCE EXAMINATIONS 

German Listening Spanish Listening 

Comprehension Comprehension 

“S=: 

the SAT. . achievement tests, the questions 

In the following discussion of the p«fi 

presented represent some ^ u^nine to take the achievement tests. 
{1966b) which is bLklet^s intended to help the student 

Ind'emtanTthrtyprof questions that will be asked and the manner in 
which they will be presented. 

ENCUSti CoMPOStTtON multipIc-choice and essay format. 

The English Composition student's ability to communicate 

The intent of this test ts to ideas logically, and to use language 

ideas correctly and effeettvety, g ^ „ Entrance Examination Board, 
with iensitivity to tone and meaning (college 

1966b). 

• There are four different types 

of the underlined parts or d 

of these examples follow: crtainly^ 

Had we known of yooK*”'"^*"^ B ^ 

invited you tojoin our party. 

D ce to Ae view that most apparel is bought 

Noonelindsdiflicul.yto^:^"'''' , 

for the sake of respectable app — 

C 

the person. No err^ 

1 _ f,om A DeseriptiM of the 

ition Board, New 
Board. 



GROUP STANDARDIZED TESTING 
310 

Another type of question requires the student not only to 
but also to choose dte best manner of Ph^^^Xo^efCvhi* 

In the following sample question the student can choose ( ), P 

the original sentence, or one of the alternatives. 


p For eight years he lives in New York and he still does not know how to 

. find Yankee Stadium. ' 

; (A) For eight years he lives in New Y^ork and 

■ (B) Having lived for his last eight years in New York, 

(C) Although he has lived in New York for eight years, 
f' (D) Despite eight years’ existence in New York, 

1; (E) After his having lived eight years in New York, 


The third type of question, which deals with correct and effective 
pression, is in a sentence format in which the possible souree of error is 
underlined and no possible optional corrections are offered, borne oi 
following illustrate these items: 


ex- 

not 

the 


No sentence has more than one kind of error. Some sentences a' 
i' no errors. Read each sentence carefully; then on your answer s e 
[' blacken space 

'I A if the sentence contains faulty diction, 

I i B if the sentence is Kordy, 

I I C if the sentence contains cliches or inappropriate metaphors, 

[) D if the sentence contains faulty grammar or sentence structure, 

E if the sentence contains none of these errors. . 

^ One of the sources of the king’s income was an entire monopoly o 
] the whole manufacture and sale of salt. nut 

' His mother mentioned that he will go to kindergarten next fall, 

; she will be very much surprised if he liked it. 

^Vhile the chemist was experimenting in his laboratory, he detecte a 
r new' cleansing compound. 


'Fhe fourth type of question dealing with correct and effective writing^ 
similar to the one preceding, except the sentence is always correct, i ^ 
student is asked to change the sentence according to given directions, that js, 
to “rc%vritc" it. In the following two examples the correct process an 
procedure are illustrated: 


r Sentence : Coming to the city as a young man, he found a job as a 
t ! newspaper reporter. 

I Directions: Substitute He came for Coming. 

I- (A) and so he found 
(B) and found 



COLLEGE ENTRANCE EXAMINATIONS 


311 


(C) and there he had found 

(D) and then found 

‘■"li “hrS'sentence probably read: "He 

used one of the alternate phrases rvonld etan|^Aejneao^ 
intention of the origtnal senrenec, rvonld be a|»rlyn^^^^ 
rvodd be less effective than another possible revis . 

Ltriirr,. Or^SFT^lSrrvedtb, Sarah bad many so, tors. 

Directions: Begin rvith Many_mcn_eo^. 

(A) so 

(B) while 

(C) although 

(D) because 

(E) and ,, ^„hflhlv read: "Many men courted 

^ Vout rephrased T|ds'^„erv sentence contains only 

Sarah because she rvas None of the other choices will 


1 titintoanciivwv. 

^ ■ ^,hp: Student to write on an assigned 

The Essay Test. The essay test '"1“''“^ ^ ^hool and college English 

topic I. « -aluated for the ColU^^^ Composition 

teachers, and the resulting g assignment is the following. 

test score. An example of such an 6 


•1, Have 20 minutes to plan and rvrite an essay on 

Directions: Aon rvill ha 

the topic given beloiv. ,i,s „ld arc resistant to 

i^r^'st.;:;:ait rvitbom— 


andloA ’n' roll, is the "follorm h rvithout deviating 

=SSS2sS5“S-= 

effectively. Be much you "r;''' ^ sheet. You mil 

more essay on one sp^ S'find that the space 

You must fit y°„hichtor'v>te. ' j if you mile on 


space 
you uTlte on 


GROUP STANDARDIZED TESTING 
312 

In addition to the essay test, sometimes the English Composition 
include a pootly iiTitten passage printed with fP-es be weenjhe 

lines so that the student can make appropriate corrections d y 
paper. 

The kVwTiNG Sample , , , 

The student is given one hour to write on an assigned " 

the essay are fonvarded to the student’s secondary schoo and to the colleg 
where he is seeking admission. The essay in this sample is not graded ) 
the College Board. 


Foreign Languages . . t »• 

Achievement tests in this area measure competencies in Latin, n . 
French, German, Russian, and Spanish. Percentile ranks are presented 
students having two, three, or four years of study. 

The CEEB dmdes the questions into four main areas:^ 

Situation Questions. Situation questions attempt to measure^ fam lanty 
with the language as it is used in everyday situations. They are intende 
test a student’s ability to read and visualize different kinds of situations an 
to be conversant with colloquial speech. 

Usage Questions. Usage questions attempt to measure correct languag 
patterns. , . . 

Vocabulary Questions. Vocabulary questions are presented in incomp 
sentences and several words or phrases. The student’s task is to choose one 
of the words or phrases without altering the basic meaning of the sentence. 

Reading Comprehension Questions. These items are generally based on 
passages of from 100 to 300 words. They involve understanding the author s 
point of view. 

In addition to English and foreign language tests, the CEEB offers, as 
has been previously noted, achievement tests in other academic areas. These 
tests cover three basic academic areas, history, mathematics, and science. 

History . . 

The history area is divided into two tests, the American Histoiy' and Soaa 
Studies Test and the European History and World Cultures Test. Each t^t 
contains around 100 questions and has a time limit of one hour. Content in 
each test \‘aries in difficulty and format. Some questions are presented m 
the standard multiple-choice form; some require students to match a state- 
ment with a given idea, es'ent, or person in a list; and others require the 
student to apply knowledge to presented material such as maps and grap"* 
or WTitten statements. The student is required to exhibit knowledge of facts 
and terms and the ability to apply this information. 

‘For the modem language testa. The Latin test is divided into two sections, 
vocabulary and Reading Comprehertsion. 



COLLEGE ENTRANCE EXAMINATIO^K 


313 


Mathematics ^ . t it 

The mathematics area is divided into ti^-o tests. Mathematics T«t. Level I 
and Mathematics Test, Level 11. Mathematics Test, Level I Standard) 
attempts to measure broad general areas of mathematK generally covered 
in coUege preparatory courses in mathematics. Half of the questions drf 
with algebra and plane geometry; the rest range 

geometry and elemenra^^^^^^ 

“^nd:;;r^nr;peld^hatsmde5.m^^^^^^ 

topics presented. accelerated or 

with high ability in This tKt is only recommended for students 

advanced courses m mathematim. f agv 3 „ccd students 

:!;h 

college preparatory mathematics. 

Science tests one in biology, one in chemistry, 

anJSh^i-'AluteU^^^^ 

(1966b) that these tests measure, 

(1) the ability to ')™“"''”;j/"Ji"frappl7<h«' “"“P'* 

concepts and principles, ( ) , jg situations; (3) the ability 

principles to fam'lrar a"d (smphasiaed more m physics 

Lndle quantitative , ,4, .bili^ to interpret 

and chemistry than m biolowLU '’‘P“T htmi 

« U ' 

arising in each field [P- 9*]- 

• • •moorunt to remember that 
it another. Colleges vary one must consider 

placement. j tnerter what these scores m ’ . achieve- 

In order to t"fi«sta"d be^ do ■'“> „„ one 

several factors. PAf"! ‘ be eompetinS "■'* ""‘.he Advanced Mathe- 
ment tests, an Social Studies 

L — ■ 

relation to the number of yearsn 



GROUP STANDARDIZED TESTING 

;sr4’ 

rcs-;‘r.s».» 

score s«t>- points higher thsn those tvith three years of stud). 


American College Testing Program 

One of the nener college admissions testing programs is the 
College Testing Program {.\CT). The .ACT is a federation of 
founded in 1959. It is an independent and nonprofit corporati . 
American College Testing Program (1965) states that their mam purposes 


t proride estimates of a student’s academic and non-academic 

5 potentials which are useful in the admissions process 

pro\ide dependable and comparable information for pre-co eg 
\ counseling in high schools and for on-campus educational gui 

i provide information useful in granting scholarships, loans, and ot e 

1' kinds of financial aid ^ 

1‘ help students present themselves as persons with special patte 

r of educational potentials and needs ^ . 

i help colleges place freshmen in appropriate class sections in intro 

I duclorj’ courses such as English and mathematics . . 

I help colleges identify students who w'ould profit from speci 

programs such as honors, independent study, and remedial programs 

I help colleges estimate whether a student should be considered or 

r advanced placement and further examination with more intensi'C 

I I advanced placement tests 

I help colleges examine and improve their educational programs 

L [p-T]. 


There are over 1,500 educational institutions and agencies in fifty states 
and the District of Columbia in the ACT program- This includes institutions 
of diverse objectives and curricula such as technical schools, universiticSi 
four-year liberal arts colleges, junior colleges, and nursing schools (Amenca*^ 
College Testing Program, 19&7a)- 

Hjc act examination is administered five times a year in 2,000 test 
centers in the United States and Canada. The testing dates are in Octob^ 
December, Fcbruaiy, May, and August. In December and May the A 
is also given m-erscas. 



COLLEGE ENTRANCE EXAMINATIONS 


315 


The ACT Test Battery^ 

The ACT Test Battery- is made up of four tests— English Usage Test, 
Mathematics Usage Test, Social Studies Reading Test, 

„rl^;iSLio„,pLctuationusa^.^h^ 

and format are similar to the English mathematical 

The Mathematics Usage Test is fifty minutes, 

reasoning ability. There are sot„„o„ of practical quantitative 

The emphasis in the examination ,. hematics courses, Mathc- 

problems that are “ ™hooUourses are also covered, 

matical tecliniques used in hig measure evaluative reasoning 

The Social Studies Reading T>=>' appraising student 

and problem-solving stalls . , representative social studies 

comFchension of reading passage culWfmmrep^,^^ 

materials. In addition, items that trat 6 especially 


at test understanomg tri h/ 
knovvdcdge'o'f reference are fifty-two 

The Natural Sda"''* Reatog T ^ critiral 

administration is thirty-five i„ the natural sciences Ihe 

reasoning and P'.““™--“‘Sopment and testing of 
major emphasis is on the rsage and questions follow . 

Zeolites— a special family O peculiar behavior. . ) 

each crystal cnn.a.mng » (,„3. heating) and * 

^ sources for . he 

¥fi"e1„rfsSdem^ „„ 

tion. For specific, lowa 52250. . (!967--^S ed.) ?°P^r^c American 


gkoot standardized testing 

Because heat so directly affects the behavior of , 

regulation of temperature gives precise f ^ 

p^perties of a zeolite. Molecules that are no under 
diameter of the apertures enter the crystals more 
temperatures. The ‘■mesh” of the sieve can be altered "Ot orij b) 
changing its temperature but also by introduang different ions mw 
it. Ion exchange and temperature, working separately or tog > 
can adjust the mesh of a molecular sieve to the needs of a particular 
separation problem. 

12. Which of the following happens when zeolite crystals are hea 

A, Ion exchange takes place. 

B. Small molecules enter the molecular sie\-e. 

C- The crs'stals lose twter vapor, which may later be recaptur 

D. The crj'stal decomposes permanently, with the loss of tvater 
^■apo^. 

13. X-ray diffraction of zeolite shows 

A- numerous interstices 

B. water molecules 

C. charged atoms 

D. the position of the exchangeable ions 


ACT Scores and WTiat They Mean 

ACT uses two t>'pes of scores for reporting test results, standard 
and percentile norms. ACT uses a scale from 1 (1°^') 
as its standard scores. The scale is the same for each of the four tests. 
Raw scores are obtained by totaling the number of correct responses. 
These raw scores are then converted into the ACT standard scores. 
The standard scores arc converted into percentile ranks in order to compare 
students whh others in specified groups. In addition to the scores on 
four tests, the student will receive another score. This score, called the 
comporite score, is an average of the four tests and indicates the student s 
general ability to succeed in college. Copies of the ACT scores, reporie 
in percentiles, are sent to the student’s high school and to three colleges 
of his choice. 

The standard score s>-stem of ACT was developed to facilitate test interpre- 
lation for teachers and counselors. It was originally constructed for the lo'^ 
Tests of Educational De>'eIopmciit (ITED) and modified for -\CT use- 
The basic and major purpose is not only to help convey test scores in a 
meaningful manner but to be sure that test scores are interpreted with prop^ 
respect for errors of measurement inherent in the scores. The probable 
error of measurement itself was used as the unit of measurement in the 
scale and is slightly larger for the ACT-tested popubtion than for the iTED. 
It, of course, varies from test to test. The .American College Testing Program 
(1907b) states, 



COLLEGE ENTRANCE EXARONATIONS 

For the ACT population as well as for other populations, however, 
the scale automatically prevents test users from attaching signific^ce 
to raw-score differences that arc only a small fraction of the standard 
error of measurement, since a given scale score spans many tnvul 

""ThrfoUmring a^ra few of the important normative characteristics 

of this scale. , , 

1 is the lowest possible standard score. 

, high school seniors [p. 8]. 

types of institutions of higher learning 

school seniors; ii„_ technical institutes, and normal schoo s 

3. Type I (junior college. 

offering two-year programs). , . w|,elot and/or first ptofesional 

4. rypf2(sohoolsoto”8” f bachelor of pharmacy, 

deeree. for example. B-A-. . 1 

or®B.S. in onglncoringh ,„j,er second ptofasiona 

5. Typv J (soho* ® 

‘''8'“’' 1 ( Uools offering Ph.D. and equivalent degrees . 

6. Type i (sohoo'* ® „„rmatlve 

To illustrate how P''“"“’'j”,Janfhis scores, let us lo*^ 
groups may help the stu ^ j score of 20 on t e jgeted high 

Jack Ellis. Jack “ f "rcUo "'hen he r« i" 

He was ranked in the , collego-l/o';"'' s c-ileees, he was again 

school seniors; when ^ ^ ,o freshmen m Type ^ ^ in the 

55th percentile; when j to freshmen m Type 

in the 72nd percentile; " ^ freshman m ^'he was in the 41th 

62nd percentile; 'vh'" Type 4 

pdreentile; and when compare 

for jaek-a ae.,^.-- 



GROUP STANDARDIZED TESTING 

Thus these students represent a wide range of ^ihti^. ^ g 

group is made up of students who took the ACT and are ® „ 

fo coUege. Note that Jack's percentile rank in this group is aa “ P 
11 in the Unselected group. Now let us look at the four 
percentile ranking as compared to Tj-pe 1 students 
«ntile ranking as the Unselected group. This indicates that W 

group probably has the same range of abilities as the Unselecte g P 
Type 2 shows a more select group, and Jack’s percentile ranking a 
72 to 62. Type 3 is slightly more select than T>tc 2, and Jack s ranking 
four percentile points, whereas in Type 4, the most select group, Jac ' ^ 

at the 44th percentile, indicating that in this type of college he wou 
slightly below the average, whereas in Type 1 he was almost in the upp 


quarter of students. ^ . . 

These reference or norm groups are extremely important m selecting 
appropriate college. Jack would probably be very’ successful, all other t 
being equal, at a TyT® I college and moderately successful at a . 

college. Enrollment in Type 3 and Type 4 colleges would entail a great ea 
more effort and present a greater possibility of failure. 


ACT Self‘reports 

In the actual test setting the student is requested to report his recent grad^. 
before his senior year, in each of the four subject areas to be tested, Englis , 
mathematics, social studies, and natural sciences. ACT (1967b) states, 

Perhaps the most reliable research finding in education is that high 
school grades are predictive of college grades; further, that academic 
aptitude tests and high school grades combined are a better predictor 
of college grades than either alone. This knowledge led ACT to 
initiate regular collection of self-reported high school grades- - • • 
These self-reports are considered estimates of high school academic 
achievement, for presumably high school grades depend on hot 
academic aptitude and other characteristics such as persistence an 
study habits [pp. 4-5]. 

ACT Student Profile 

The ACT Student Profile is an interesting and unique instrument, based 
on the assumption that the quality' of education a college provides is p^rt ) 

* Median score for Unselected group is 16.5 and mean score for Type 1 group * ** 

16.9, See Chapter 5 for ACT percentile tables. 




COLLEGE ENTRANCE EXAIRNATIONS 319 

dependent on the amount of relevant data it has about its students. The 
Student Profile asks for this kind of information. For c.\amplc, the student 
indicates his vocational plans: his probable major area of study; the kind of 
degree he wants; the size of his hometmvn; the kind of housing he w-ants at 
coUege- and other data helpful to the college in student guidance. The ACl 
Student Profiles may be used in admissions, planning freshman courses, 
atonced plclentf scholarship and loan programs, student counseling. 

school Stadesand^™^—-" 

factors bearmg on success , n nonamd™ 

SrdsTSl-udVSthcreta^^^^^^ 

accomplishment of over 7,000 college freshmen, state. 

Some of the practical aPP'j?““ the 

sponsor is interested only ‘"''"‘*' 8 ,„d tests of academic 

classroom in college, then ig ■jgjjig On the other hand, if a 
potential are the best tedimq . },o\vill do outstanding things 
Sponsor wish^ to find 

outside the classroom “'‘jjf' tt.td of the student's competencies 

make an effort to secure a better 

and achievements in high , -litation blanks for admission 

^SrvS'dretSrrrchrdUpas. achievement and invohemen. 
(p. 22], 

g-ersities. Grades m sp validity varted fromsch possessed 

It was found that predic ;^cT an ^ cjca'T 

course to course. It was «;®°j°“jp„dicuve validity- JJ'LS 
about the same relatne Uljer the ACTI or ‘ ^ jv^i- to be 

“lt,lhrndHn.chins(mb)>^^^^^^^^^^ 

School and College Ability Ton. 



groxjp standardized testing 
320 

"B^rth 0965) found that success on the SAT and CEEB writing sample 

did not require knowledge of the terminology of 

They stated, "Only a knowledge of certain questions of usage and a scnsit y 

'°H 0 ^{ 1968 ) i^a Itudy'of 169 students from four colleges found a grwt 
deS o?diversiiy in freshman classes and colleges Wide differences m grading 
practices at different institutions were noted. These differences, ac 
Hoyt, explain the low correlation that he found between the gra e 
ACT composite score. The correlation of the mean college gra P 
average with the ACT was only 0.34. This finding confirmed earlier 
studies. 


The Practical Meaning of College Entrance Examinations 

College admissions tests are not perfect, or even close to perfect, 1" f”'®' 
casting how a student %s*iU perform in college. There is no one device availa 
today to predict with 100 per cent accuracy a student’s future success o 
failure in college. The tests that are available today, however, do present a 
fairly good indication of a student’s chances when coupled with high sc oo 
performance. . . 

Colleges do not rely on test scores as their sole criterion for admis^on- 
The formula used by most colleges in determining admission is the hig 
school record (academic and extracurricular), teacher and counselor recom 
mendations, high school standardized test scores, and of course, coUeg 
entrance examinations. 

Scores on the CEEB and ACT can only be interpreted in the light of t e 
indi\ddual college. A score of 500 on the verbal section of the SAT, or 
example, may be a satisfactory score for College X and far below what is 
acceptable for College Y. Soldwedd (1966), in discussing this problem, 
states, 


Generally you can assume that (1) low test scores may adverse y 
affect chances of admission; (2) high scores will help but do not y 
themselves guarantee acceptance. The trouble with this generalization 
is that it does not answer the questions How low} and How high- 
How low is bad and how high is good will be determined by what- 
ever the traffic will bear. You may find it difficult to get answers to 
the questions from admissions people. And there is good reason to 



COLLEGE ENTRANCE EXAMINATIONS 


^GE ENTRANCE EAAMlNAiiu«a 321 

the reluctance. WTiat coUeges rcould like to get, in terms of student 
body performance on standardized tests, and ivhat they actually take 
arc often wo different things. Certainly it must be added that there 
are institutions trith sullieient endon-ment funds to hold rigidly to a 
test performance score, llese are not in the majority however. 

to be saddled with the lowest score from year to year [p. 51]. 

Ut us take the ease of Jane and Bill to illustrate the place of college 

admissions tests in screening app''“"K- g hjgh school 

Jane is an attractive senior girl a Glen H.g 
record shows that she is a the editor of the 

She has been a cheerleader for three yra ijf . excellent recoin- 

sehool newspaper. Her t". rtrsL oMainri a score of 400 

mendations. On the section. Her achievement 

on the verbal section and 350 ,. 1 , and social studies are all somewhere 

I„The«S‘tro='.S^^^ 

generally, test scores such J""' * , hjs overextended herself m high 

The eastern school may f«l J”"' “dards. Thus, even though h 
school and could not live “P •“.f'dm.ssion to this type of school. Factors 
record is excellent, she may ^ ^„„ded would be «ry “j 

such as the type of hig sc , excellent academic reputation, s , . 

If lane’s high school is one of „„ember is that Jane could be 

piiTdmuJon. The impoRan, .h.ng__» her low wlhg^ 

admitted to other colleges in admission to a hig I 

entrance examination omdits. Thus Jane is not doomed 

school because of her othe --agination scores. . School, 

because of her Bill is a senior North g 

Now Ictus lcokatad.fFerensrtuamn poor. 

He is a rebellious boy '' o grades in school « extracurricular 

about homework. ‘^?"“‘'p“‘te!'duM little in the *>' Sege Board 
His grade overage « C «eJk,d hjo teaeh'W ^xthematies- 

activities and is , f„iiowing scores, verba 

examinations he receive _600- French--^0l *^^^”''rceed in college. Even 
700 ; English has the ability to su 

These scores indicate , he may find som b .^bably ga'" 

with these excellent scores, howe« . „ he jl P 

to him because of his PO^Je of his P™'”'=“’S P°“ 
admission to some co g 



GROUP STANDARDIZED TESTING 

' The reader can see from the two examples cited that test 
the only criteria that the college uses for admission. Certainly 
do not Ptesent such extreme cases. However, there are vanat.ons ancHor 
this reason the person’s whole record is taken into account I" 1“"= a ^ - 
it is possible that she has developed good study skills .„ 

of tiL to her homework. The chances are that Board 

in her college work and help her earn higher grades than her College 
scor:i wou?d indicate-that is. if she has made good grades on quiza« and 
final examinations in high school and has not received her high grades 
because of her appearance and ability to get along with her teachers. 

In the case of Bill we may state that in predicting success or 
college there are certain negative signs even though his test scores are v ^ 
high. UTiyi Because Bill has operated below his potential for so long, 
chances are that he will continue to do so in college. Study habits hiiv e to 
developed, and Bill has apparently not done this. In addition, he does n 
seem moth*ated to do so. He may state that this will all change in college, 
but the chances are that he wl! continue this pattern. 

Of course, there are many exceptions to what has been state^ '' e ar 
speaking in general and in terms of probability, not certainty. There w 
students like Bill who not only change their school pattern but go on to^ 
scholars and prominent people in the arts and sciences. In general, 
this does not happen. Still, because of the chance that it may. Bill should e 
encouraged to go on to college, and counseling, to ascertain the reasons for 
his underachievement, should be planned. 

The important thing for those who are going to deal with students to 
remember is that college admissions tests are to help students and are not 
intended to hurt them. Remember that the individual who enrolls in a college 
that is beyond his abilities may “flunk out” or quit in discouragement. 1 . 
on the other hand, he is guided into a college that is commensurate wth his 
needs and abilities, he is more likely to complete his education. 


References 

American College Testing Program. General information bulletin. Iowa Cit)* 
Iowa: American College Testing Program, 1965. 

American College Testing Program. ACT student registration manual. (196/-6o 
ed.) Iowa Cit)', Iowa; American College Testing Program, 1967.(a) 

American College Testing Pr<^ram. Using ACT on the campus — '61- bS. lo""® 
City, Iowa: American College Testing Program, 1967.(b) 

Angoff, W. H. The College Board SAT and the superior student. Superior Student, 
1965, 7, 10-15. 

Barth, C. A. Kinds of language knowledge required by college entrance examina- 
dons. English Journal, 1965, 54, 824-29. 

Boj'ce, R. W., and Paxson, R. C. The predictive validity of eleven tests at one 
state college. Educational and Psycholo^al Measurement, 1965, 25 (4), 1143-1147* 



COLLEGE ENTRANCE EXAMINATIONS 3“ 

Chaiincev. H., and Dobbin, ]. E, Telling baa a history. In C. I. Chase and H. G. 
Ludlow (Eds.), Railings in tJucalnnnl md psychnhgicnl minsuranrnt. Boston; 

. • T, I J A„^-f>ilanonhe Colkge Board Schohstic 

«“?« cign Entrance E»»i„atio„ 

Es^inanon 

Sehohttic Aptitude Test. Prmccton. N-I- i-" 

1966, (d) 1 and non-academic accomplishment : 

Holland. J. L.. and Richards L M. .1^ J 2. Io«a City. Iowa: 

Correlated or uucorrehted? ACT U^«»» '«P« 

Anterican College posting Prcgram lDM. 

Hoj-t, D. P. Decnrtion and prediction oi 
jlleasnrmatcndEvntualwninGmdnnes.J^I^'^^^^^^^^^^ p, l, Dresscl 
luola, A. E. Selection, Houchion Mifflt”. . , 

(Ed,), Evahation in <■ c Relative uselulncss in P'''*''™* 

Urn, L. I.. Abell, A. P.. other variables. >.r,.I ./ 

academic success of the ACT. die w i , 

Enpttimtnid Edualion, 1%6, 33 . Cditgt Tals md m 
Monday, L. Cmpnalifs pr,df.v,aU^J^ ^ Ctly, Io»>- 

Other uholasue aptitude tests. A 

American College Testing * Y Macmillan, 1966. 

Soldwcdel, D. J. Preparinsfor college. New 



CHAPTER 



Personality, Attitude, 
and Interest Inventories 


The last three chapters have focused on academic potential and perform- 
ance. These are the basic ingredients of school life. They are not, however, 
the only elements essential to success in school or life. The child (and the 
adult) must also be able to harness his talents in fields that satisfy his needs. 
The problem of vocational choice is of paramount importance to the in- 
dividual student and to the society that needs to utilize the resources of 
people. This is where interest inventories may be of assistance. 

In addition to helping children become aware of their interests the schoo 
is also concerned with the mental health of its pupils. A child with an 
emotional problem is sometimes unable to use his abilities and becomes a 
school failure or dropout. The school is, therefore, vitally concerned m 
helping its boys and girls find their interest areas and spotting those children 
who need psychological guidance. This chapter will be devoted to illustrating 
the nature, use, and reasons for personality, attitude, and interest tests m 
the school. 

Personality Inventories 

The word personality has different meanings to different people. To some 
it is another word for popularity^ the person either has it or lacks it. To many 


32S 


PEBSONALITY. ATTITUDE, AND mTEBEST INVENTORIES 

other psychologists personality compnsra the tott ^ S ^ 

of a person. Obviously, ferronoWy is a broad and general 
professional people define differently, t is . p j[,e individual 

Serms, for the definition is dependent upon the eoneepts 

explaining its meaning. adjustment. They are 

Educators generally vietv ,|,e olasiroora. Is the chUd 

concerned v.ath the functioning problems that interfere mth 

well adjusted enough to learn, or answer to this and 

the learning proeessf When f “,^ro*s may be administered 

Other questions of adjustment, pe 

in the schools. . „ferred to as personality tests. 

Personality inventories are nought or wrong answem to 

They are not tests, but ,_pn„d to statements and questions 

specific items. The student is a„d sehool situation, and 

concerning his own arc’many times specific: “Dny"’" “ 

‘t !I 

Projective tests are indefinite er^,. A^rsonality mNcntory, 


lire the peKon ^ o^her 

Projective tests « '"“'I;;'- ’ sj) The fotntat, similar 

unique manner. (See ^ presented m an odj 

hand, is structured “ ' ^^iavement. , , ,aiap referred to by 

to tests of aptitude ^ jnveatoties in will briefly review 

There are many pem most widely used and 

Mul.lph.sle personality 

respected personaliqi inventory. 

Inventory 

TAornd*. 

The Thorndike describe 

'f * "“re^fTa* « “"P. to pr=b= “"P >’=>'*'’■ 

of S Note '•’.“““’„”d«eri>>v».P'"°”;„d seniors in high 



Tnbic 24 DMCr!plio..of2VX;rDlnio.-»io™ 



326 




From VwrmUkt Dimtnsiont of Temprramont. .\fam,al. by nobrrt L. Thomdftc. (Reproduced by permission. Copyright © 1966. The 
Psychological Corporation, New York, N.Y. All righu reserved.) 



GROUP STANDARDIZED TESTING 
328 , 

Thetest bookUtprescntstwenyscts (labeled .Vom^ 

contains ten statements. The student .s d.reeted to 

statements and then go back and choose the three that arc most 1 J 

l e sT^fies his response by blackening the j tladen 

bide the number if the item. He is then requited to go back and WaeUn 
ie Lver space marked D (different) beside the three statements that 


1 Tlie program \ou Vkatch most TPgnlarly on tele- 


- vision U a tkovs broadcast. 


2. You arc likely to Veep people waiting for )Ou. 

Z 

3. Nothing seems to work out quite right for you. 

4 " i 

4 You often seem to be given the “dirty" job to do. 

5 ^ 

1 S. You would rather read a history book than a novel 

1 a 

1 6. You are usually *03 the go." 

^ ^ 

1 7. You tend to "blow up* in an emergency. 

8^ ^ 

■ 8. You look forward to the years ahead. 

9 ^ 

1 9. You usually plan things well in advance. 

10 U " 

1 10. You generally End other people enjoyable. 


1 . ^ , 



Figure 31. Sample set of hems from Thorndike Dimensions of Temperamcn . 
(Reproduced by pennission. Copyright © 1963, The Psychological Corporation, 
Ne>v York, N.Y. All rights rcser\-ed.) 


“most different from you.” Figure 31 illustrates the types of items that are 
presented. Thorndike (1966), in describing his test, states, 

1 Evidence tends to support the contention that the TDOT portray^ 

j the individual both as he sees himself and as others see him. Thoug 
I it does not pretend to delve into deep layers of inner personality 

] dynamics, the inventory appears to be quite successful in providing ^ 

j differentiated picture of the manifest personality [p. 5]. 


The norms for each of the ten dimensions of the TDOT are based on 
4,008 students in grades eleven and twelve from nventy high schools an 
1,493 freshmen from ten colleges. Separate tables are presented for each seX- 
In addition to examining these students, Thorndike asked each one to ootn- 
plete six questions relating to home and family background and education 



PERSONALITY. ATnTUDS. AND INTEHEST INVENTORIES 329 

and vocational aspirations. Variations in mranscoresamongceitainsubgroups 
were found to be statistically signj6c3nt. ^ a 

The follomng are brief summaries of these findings as reported by 

Thorndike (1966). 


Sociable The rural respondents appear less sociable than those 

froS^o! cities. There-sasu^estionofh^hersociabrUty among 

males majoring or planning to major rn busmess. 

Ascendant. Greater ascendance tends to go 
residence, ,’’'8'''^'. to°c^|J^r™ducation' among high 

Chee:fn;P.acid,A^These-.esshotvnoe,eard,d^^^^^^^ 

rural groups. For the trait being high for those 

with vocatiorial plan. “ j for those planning to enter 

planning agrrcultutal careers 

the professions. . number of correlates. Among 

Reflective. This ‘‘'""'""“’"'““nment.professionaloccupa- 

bo'^nnZ^sociatedndthanntbanom.ron^^^^ P „ 

Tn i father, pla"= pmtess.onal oceupannn. 

major in languages or m 

professional career. j^elates. It has some tendency 

Impulsive ^ This trait *o|« ft” 
to be high for gnb " differences. 

scale aho- * , n 

highly edec-=8 patent-ep”" 

suggestion '»>< cWdten^^l^ j„„„g or Kpi 

themselves to be 1 ,n„t, [pp. 

nrts rate themselves Ion on 

than the reported 

Phorndike (1966) readily con 



GROUP STANDARDIZED TESTING 
330 

TDOT is lower than one is accustomed to expect in tests of ability, but 
compares favorably rvith many other further 

The student interested in pursuing the technical ° measures is 

in terms of validity, reliability, =>"<» 
referred to the TDOT test manual (Thorndike, IJOb). 


Mooney Problem Check List 

Another type of personality inventory is reprMmted ^ 

Problem Check List. The Mooney, along tvith its sibling " 

test; it goes one step further, however, in that it yields no scores. Mooney 
and Gordon (1950), in describing the utility of their test, state tha 
“usefulness of the Problem Cheek List approach lies m its economy 
appraising the major concerns of a group and for bringing into the open 
problems of each student in the group" (p. 3). 


Major Uses 

Some of the major uses of the Mooney in a school setting are 


the followng: 


Counseling 

To prepare pupils for interviewer by having them think over 


their 


own problems. 

To facilitate communication between counselor and student. ^ 
To save time for the counselor by providing a quick overv’iew o 


various problems. 


Research _ • 1 

To provide information useful for curriculum planning, individua 
ized instruction and counseling needs. . 

To measure the effect of certain school activities and schoo 
oriented problems and objectives. 

Homeroom and Group Guidance 

To help students identify personal problems and needs. 

To provide material for discussion in group guidance and orientation 
programs. 


The Mooney Problem Check List has four forms to be used for different 
age levels: junior high school {J), high school (H), college (C), and adu ts 
{A). They are all self-administering and for counseling purposes require 
no scoring. It should be noted, however, that for certain objectives, such as 
research purposes, the problems checked may be summarized by simp*> 
counting the total responses in a number of problem areas. These are^ 
differ according to the form. The problem areas and number of items m 
each for the junior high school and high school and college forms are as 
follows (Mooney and Gordon, 1950): 



PERSONALrry, ArnnroE,»roiNTraESTiNvrNToiUEs 3 

junior High School Form 
210 items, 30 in each area 

I. Health and Phj-sical Development fHPD) 

11. School (S) 

III. Home and Family (HF) 

IV. Money, Work, the Future (MWF) 

V. Boy and Girl Febtions (BG) 

VI. Relations to People in General (PG) 

VII. Self-centered Concerns (SC) 

Qolltge and High School Fotmt 
330 items, 30 in each area 

I. Health and Physical Oetelopment (IIPD) 

11. Finances, Living Conditions, and Employment (FLE) 

HI. Social and Recreational Activities (SRA) 

IV. Social-Psychological Relations (SPR) 

V. Personal-Psychological Relations (PPR) 

VI. Courtship, Sex, and Marriage (CSM) 

VII. Home and Family (IIF) 

VIII. Morals and Religion (MR) 

IX. Adjustment to College (School) Work (.\CW) (ASW*) 

X. The Future: Vocational and Educational (FVE) 

XI. Curriculum and Teaching Procedure (C’l’P) 

The usual analysis of validity and reliability arc not relevant to this type 
of instrument. The interested student should consult the manual (Mooney 
and Cordon, 1950) for specific types of validity, reliability, and norms 
reported. In addition, a bibliography of research and use of the instrument 
is presented. 

The follosving are some tyTpical items from the junior high school form: 


Directions^ Read the list slowly and as you come to a problem nhich 
troubles you, draw a line under it. 

1. Often have headaches 61. Being teamed 

2. Don’t get enough slcq> 62. Being talked about 

3. Elavc trouble with my teeth 63. Feelings too easily hurt 

4. Not as hc.-ihhy as 1 should be 64. Too easily led by other people 

5. Not getting outdoors enough 63. Picking the wrong kind of 

friends 

6 Gcttinglowgradcsinschool 66. Getting into trouble 
7. Afraid of tests 67. Trying to stop a bad habit 


1 Samnliv i.rms from the Mooney PrMcm Utt-Junhr II, gh f^orm. 

(Uc^uc^^rro-^’^on. Cow^eht 1950 by 'Hie P,>-choI.n:.ci.t 
New York, N.V. Alt risha resened.; 


332 


GROUP STANDARDIZED TESTING 


1 8. Being a grade behind in 

j school 

I 9. Don’t like to study 

10. Not interested in books 


68. Sometimes not being as honest 
as I should be 
59. Giving in to temptations 
70. Lacking self-control 


In addition to the 210 items there are three questions that ask the student 
to mite in his otvn words about problems troubling him and 

hewouldliketospendsomeofhissehoolt.metalkmgtosomeone*outhem. 

The Mooney in essence, then, is not a measuring devuce n ^ 

Cronbach’s (1960) appraisal of it states quite cogently that 'h' * ^ 

Problem Check List is of considerable value because it draws 
specific concerns the client is ready to talk about and wants I 

It is, in effect, a preliminary interview rather than a measunng 
(p. 487.) 


Minnesota Multiphasic Personality Inventory 

The Minnesota Multiphasic Personality Inventory (MMPI) was dcstgn^^ 
to provide an objective measure of some of the most important personality 
characteristics that relate to personal and social adjustment. It attempts o 
provide, in a single test, scores on all the most important aspects of personality 
(Hathaway and McKinley, 1951). .. 

The MMPI is one of the roost respected and widely used personality 
inventories in existence today. More research has been done with it and on i 
than any other inventory now available. The diversity of problems that ha%e 
been examined with the MMPI is truly outstanding. In addiuon to me 
numerous studies of a psychological and psychiatric nature, studies relating 
personality' characteristics to success in practice teaching, cancer, brain 
lesions, multiple sclerosis, low-back pain, and characteristics of and cntena 
for nursing students have also been done. These are only a very small fraction 
of the various kinds of investigations reported. , 

The first MMPI investigations appeared in 1940. The test materials an 
the first formal manual were published in 1943. The number of articl^. 
books, dissertations, and other investigations dealing with the MMPI has 
been enormous. 


Format 

The MMPI consists of 550 statements to which the subject is asked m 
respond in three ways: “true,” ’’false,” or “cannot say.” The MMPI 
given in two forms, individual (card form) and group (booklet). In the 
individual form the test administrator presents the subject with a box of 33 
small cards \%nth printed statements on each. Instructions are on the cover o 
the box and ask the subject to sort the cards into three stacks according to 

... 530 


the three preceding categories. The group form presents these same 



PERSONALITY, ATTITUDE, AND INTIBEST INVENTORIES 333 

statements^ in a printed format in the usual type of group tet bmklet.t The 

101“?^ Thp «?ubiect records his answ'crs on a separate ans\ < e^. 

Lay be hand- or machine-scored. The MMPI was 

individuals SKteen “^e and °W ^ Monachesi, 

LTgfsrLtd'LiSe^L^^^^^^^^^^ p“-'' 

'551). , „eas of content ranging from statements 

The MMPI items cover l;^ ““ „,„s and ncurologtcal d.sorders 

dealing with health an| and socal attitudes. Items s.mdar 
to se-Yual, rS 

to those found in the MMP 

I have been healthy daring the past three years. 

LL“or.anrsSKLSrh^;«nght to children. 

..:no ‘Vlinical scales. 

aS 5 ru/rr. The MMP. provide, scores on n,n 
They are as follows: 

. • fU<\ 6. Paranoia (I a). 

1. Hypochondriasis (ris> _ p^ychasthenta (H;- 

2 Depression (D)- g Schizophrenia (Sc). 

5. Masculmity-fem.n..y (* , 

The preceding scales 

following ‘'''■""'.''',°725+ pa'i'"'* jhringi 

Minnesota ’'“'’;‘L„ufofindividual5'^''"'’^i graduates who 

disease); a normal g P ciaes); pte«"'S 

to the University Clinic! ,„hich.r«<i"P''“""”° 

iAc.ua, iy,. he 



334 


GROUP STANDARDIZED TESTING 


came to the University Testing Bureau for precollegc guidance (26o <^«). 
and patients (221) in the psychopathic unit of the Univemitj- Hospitals 
the outpatient neuropsychiatric clinic (Hathaway and 

In addition to the basic nine scales a unique feature of the Mh PI s tts 
four “validity scales." They are not validity sca!« in the usual testing se 
but are attempts to note carelessness, malingering, misunderstanding, ^ 
test-taking attitudes of the subject. These four scales are the following. 


1. The Cannot Say Score!)). The “cannot say” score is the total 
number of statements that the subject responded to by not answenng 

“true” or "false.” . i. r u 

2. Lie Score (L). “True” responses to statements in the he scale 
present the respondent m an unfavorable light; "false 

present him in a favorable light. Dahlstrom and Welsh (196 )»/” 
referring to this scale, state that “these attributes are clear, unambig- 
uous, and generally socially unfavorable. It %vas assumed that most 
people would be willing to endorse the statements of the L scale as 
true about themselves even though the items deal with disapprove 
actions and feelings.” (p- 49). . 

3. Validity Score (F). The validity score is based upon a group oi 
items rarely endorsed by the original standardization group, includmg 
the mentally ill patients. Many of the items relate to peculiar thoughts 
and beliefs and lack of control over impulses. It is highly unl^el) 
that any subject would exhibit all or a majority of these behavior^ 
patterns. The F score serves as a check on the whole record. A high F 
score indicates other scores are probably invalid because the subjert 
was careless or did not understand the statements or that possible 
scoring errors were made. 

4. The K Score (K). The K score is the roost recent of the valicht) 
scales and was de\eloped as a measure of test-taking attitudes which 
appear either as defensiveness or as a need to represent one’s worst 
features. College students tend, as a group, to have higher K scores 
(psychological defensiveness and sophistication) than indivoduals oi 
less sophistication and those who omsdously or unconsciously desire 
to place themselves in a “poor light.” 


After scoring the MMPI, the examiner reviews the four validity scales 
before proceding further. If any or all of these scales are over the designated 
ceiling, the reoird is considered suspect; if they are extremely high, it 
considered invalid. 

In addition to the basic nine scales of the MMPI, a tenth was added, 
the Social Introversion (Si) scale- Welsh and Dahlstrom (1956) present 
studies in which a high relationship was found between the Si scale and the 
number of extracurricular activities participated in by high school and 
college students. 


PERSONALITY, ATTITUDE, AND INTEREST INVENTORIES 335 

Over 200 new scales, in addition to those already mentioned, have been 

:ht ^ 7 "; f :fch“ara«eriaauo„ of personality and as a screen, ng 
device. (See for example, Karmel, 1961.) 

InteRPHETATION I „„ „„ly fourteen scales 

Normally the ^lyity scales) and possibly several 

(nine ■'clinical," pins S, and converted to norms based on 

other ''pet'' scales of the esarmner. ^ repotted in the 

the original sample ® , „ith a mean of 50 and a SD of 10. A 

form of T scores (standard ris .he point above which psycho- 

scoreof70orhigherisgeneraly^.<i«2d examiner, however, « m 

Usually, the “bc^obtained. This is not 

oto^hfllderandstaple^^^ 

A/MP/(Hathaway sndMeehl,^,^^ MMPI proiae 

V^srS: ttTs” 0 - rrSt fege students is Orahe 

In addition to ■’’'^'relations. The J of diagnostic 

his disposal oompnter P ^ to a on'-P J P „ 32 presents 

an MMPI I'P'”"'"® “"S of an individual’s personality, t gl- 
and interpretive sta report. . ,er programming of the 

, ample of W „„„ egaged “ 3,966^967) has develop^ 

Other >nvestig-;“rs .rr^ For example. r^^rs bv computer. 


riglnally developed, It tan 



BALE AeTsJ BEBOBJ DATE 8-28-67 0004 

MINNESOTA MULTIPHASIC 

Br R. Hat*>«wT.ffc4 D. ixii. ChtinUy UtKhUr. 



: -PATIENT VICXS SELF AS HELU-ADJUSTED AMO 

I SELF REClANTt 

i -TENOS TO GIVE SOClAtLT APPROVED ANSWERS RE- 

GARDING SELF-CONTROL AND NDRAL VALUES* 

\ -INCLINES TOWARD ESTHETIC INTERESTS, 

i -INDEPENDENT OR HILOLY NONCONFORMIST, 

\ -VIEWS LIFE WITH AVERAGE MIXTURE OF OPTIMISM 

I AND PESSIMISM, 

\ -number of physical symptoms and concern 

: ABOUT BOOILY FUNCTIONS FAIRLY TYPICAL FOR 

i MEDICAL outpatients, 

, s -RESPECTS OPINIONS OF OTHERS WITHOUT UNDUE 

i SENSITIVITY, 

I -HAS SUFFICIENT CAPACITY FOR ORGANIZING WORK 

t AND PERSONAL LIFE, 

s -LOW ENERGY AND ACTIVITY LEVEL. DIFFICULT TO 

I MOTIVATE, APATHETIC, 

8 -HAS A COMBINATION OF PRACTICAL ANO 

S THEORETICAL INTERESTS. 

5 -PROBABLY SOCIALLY OUTGOING AND GREGARIOUS. | 

1 Tie Pi]rdiol,;ic(d Cvponbaa lOFl Rcportinj Service 

• X« Cot 4Stl> Street 

»e-Ygfc.H Y. 

Figure 324 Sample MMPI computer report. (Reproduced by permission. Copyr^S^^ ^ . 
1951, © 1967 by the University of Minnesota. Published by the Psychological Corpora 
New York, N.Y. All rights reserved.) 


PERSONALITY, ATIirUDE AND INTiaiEST INVENTORIES 337 

paragraphs. Each of the lUl sc y . ,, ^ „„ the scale, 

eight possible ones, depending on the mdmdua ^ 

By this method, 101 statements are MMPI, such as 

statements written for the f f rem research and 

.■Hy” and ■■D," ■>« t -er the years, 

ar^.ritte„ to eliminate contradictions among 

statements (Finney, ’’*’>■ , individual scales, the report includes 

Besides the statements c ,ions or combinations of scales, and 
other statements based on c g dimensions: factore A 

statements based on rompu e , . , p (disability), and E (enmity), 

(anxiety), B (boldness), C h,„e, have subsequently 

Finney and his W"'”*";.®'- , as written for different purposes, 
developed several different Ws o F ^.nds of reports; 

The Finney-Auvenshme program now wri 

Finney and Auvensh.ne are n 


psychological tests. 

ne School and Fmooclity i„„„mries in the school r 

Educators may well ^ first state a toic "^j^^They 

ruUindieate rolfot^a *" t'o^rby ““ j 

In order to clarify *= ^.*,1 psy*^ > Bill's test record 

of Bill. Bill 't nabi itytRS"*”®'""^ 

worker because of his ina 



338 


GROUP STANDARDIZED TESTING 


showed he was in the superior range of school ability. However, his grades 
were very poor, F’s and D’s. Bill would spend much of his time drawing 
pictures and “goofing off.’* The social worker was unable to come up with a 
clear diagnosis, so after receiving parental permission, Bill was given the 
MMPI. The test results showed Bill was the type of boy who was unable to 
cope with his environment. It was obvious that Bill was sick and needed 
immediate treatment. Thus the social wwkcr, on the basis of the MMPI 
and other data, referred Bill to an agency that could help him. 


Some Basic Problems in the Use of Personality Inventories 

One of the basic problems in administering personality inventories is the 
amount of reading required. A student who is slow in reading may tire of 
the test and respond without careful consideration of the test item. In addition 
to the amount of reading, difficulty with words is a problem. If the vocabulary 
and abstractness of the ideas are beyond the student’s comprehension, he 
may respond to the test item in a careless manner. Some tests, like the 
MMPI, have scales to detect this problem, but most others do not. 

Another difficulty is the reluctance of some individuals to be honest in 
their answers. For most personality inventories the person must be honest 
in his response in order to get a valid picture. Studies have sbo\vn that most 
personality inventories can be faked. That is, the person \vith some psycho- 
logical insight can give whatever picture of himself he desires. However, 
on the MMPI this is not as likely to happen, because there arc scales that 
indicate whether or not a person is responding truthfully. 

The problem of faking means that a student’s scores on a personality 
inventory are subject to error, and the interpretation of these scores must be 
done with extreme caution. For example, let us suppose that a personality 
inventory is given to a ninth-grade boy who has been giving his teachers a 
difficult time. To this boy, teachere may be a symbol of authority against 
which he rebels. Therefore, if the test is given by a teacher or other authority 
figure, there is good chance that he will not be honest in his answers. 
Secondly, one must remember that pupils are taught to do their best on 
tests in school. This being the case, many children are not going to reveal 
their personal problems. Besides, some children who reveal a healthy adjust- 
ment on an inventory may in actuality be defensive and unable to reveal their 
real problems. 

Remember also that personality inventories are middle class in thinking. 
Items often have different meanings for different socioeconomic levels and 
ethnic groups. 

The educator should view’ the use of the personality inventory as only one 
step in attempting to ascertain the problems that interfere with a student’s 
academic performance. Personality inventories may provide a quick gauge of 
what troubles the pupil. Most school counselors do not have the training 
required to administer more complex instruments such as projective devices, 



PERSONALiry, ATTITUDE, AND INTEREST INVENTORIES 339 

Md they can use personality inventories as screening instruments in order 
to make intelligent referrals to appropriate resources in the community 
Of all the inventories available, the MMPI is the most reliable and valil 
Its judiaous use by trained school personnel can facilitate behavioral analysis 
and lead to the proper handling of disturbed children. (For a technical review 
of the limitations and assets of personality inventories, see Megargee 1966 
Chapters 5, 6, and 7.) ’ 


Attitude Inventories 


The attitude inventory is basically a self-report or questionnaire designed 
to measure a person’s bias toward some group, social institution, social 
concept, or proposed action. One of the most common and primitive forms 
of this type of appraisal is the opinion poll. In such a poll a group of questions 
might be presented to a community on a facet of school policy and the results 
tabulated to express the consensus of community opinion. 

The greatest use of attitude inventories has been in research endeavors 
which attempt to discover attitude differences, kinds of experiences that 
can change attitudes, and the influence of a man’s attitudes on his view of 
the world. Thurstons example, developed thirty scales for 

measuring attitudes toward war, Negroes, Chinese, capital punishment, 
church, censorship, and other practices, issues, and groups of people. 


Minnesota Teacher Attitude Inrentory 

In a more immediate area of concern to educators is the Minnesota Teacher 
Attitude Inventory (Cook ct al., 1951), designed to assess pupil-teacher 
relations. It is based on ten years of experimentation and standardized on 
teachers from different communities, schools, and grade levels. It is used in 
selecting teachers and in counseling student-teacher candidates. The range 
is from high school seniors through adulthood. 

The following items are similar to those found in this inventory: 

Most students are resourceful if you leave them on their own. 

A teadier should never let fiis students finow that Ac is igntmnnr cn' 
a topic. 

This inventory is especially useful for research purposes, but a great deal 
of caution must be exercised m any practical usage such as counseling or 
selection. 


Survey of Study /fabits and Attitudes 

Anclher .ittiwilc inv^wry of special interest to cJncalors is the Survey o! 
Study Habits and Attitudes (Btorvn ajtri Ifotanran,. Thu suryn- tvas 



34Q GROIJP STANDARDIZED TESTING 

developed to help discover why some students with high scholastic aptitude 
do poorly in school, whereas others with only average ability do well. The 
authors recommend its usage as a: 

1. Screening instrument. Administered to twelfth grade or college 
students at the beginning of the school year. Used later for individ- 
ual counseling and as a technique to discover students who may 
need immediate help. 

2. Diagnostic instrument. Provides format for systematic recording of 
student’s feelings and practices involving schoolwork. 

3. Teaching aid. In elementary courses in psychology and education 
to help communicate effective methods of study. 

4. Researcii tool. In educational or counseling processes. 

The student is presented ^s'ith 100 statements in the Surv'ey of Study Habits 
and Attitudes (SSHA) booklet and is asked to respond, in terms of a five- 
point scale, to each statement by choosing one of the following: 

R — Rarely, 0 to 15 per cent of time 
S — Sometimes, 16 to 35 per cent of time 
P — Frequently, 36 to 65 per cent of time 
G — Generally, 66 to 85 per cent of time 
A — Almost dtsays, 86 to 100 per cent of time 

The follov,ing are some sample statements from the SSHA.^ 

I believe that teachers intentionally schedule tests on the days following 
important athletic or social activities. 

I believ'e that a college’s football reputation is just as important as 
its academic standing. 

With me, studying is a hit-or-miss proposition depending on the 
mood I’m in. 

1 am careless of spelling and the mechanics of English composition 
when answering examination questions. 

I believ'e that one way to get good grades is by using flatter}’ on your 
teachers. 

I think that it might be best for me to drop out of school and get 
a job. 

Figure 33 presents a sample of a diagnostic profile for a college freshman. 
The profile includes the sc^es and what they mean. 

^l^roduccd by permission. Copynght 1953. 'D 1965, The Psychological Cor- 
poration, Xe^^• York, N.Y. All rights resersed. 




PERSONALiry, ATnXUOE, AND INTEREST INVENTORIES 
Allport. Vernon-Lindzey Study of Values 


A different type of attitude inventory from those thus far cited is the AJlport. 
Yernon-hindzcy Study of Values (1960).^ The Study of Values was developed 
to measure the relative prominence of six basic interests, motives, and other 
evaluative attributes. The classification, based upon Spranger’s Tipes of 
Men (1928), is as follows: 


Theoretical. The basic interest of the theoretical type of man is the discovery 
of truth. In pursuing this objective, he takes a “cognitive" attitude, that is, 
he is empirical, rational, and critical in his “intellectual” approach to life. 

Economic. The economic type of man is characterized by interest in what 
is useful and practical. This type is very similar to the stereotype of the 
average American businessman. This is the kind of man who wants education 
to be practical and looks upon pure research or unapplied data as wasteful. 

Aesthetic. The aesthetic type of man places the highest value on form and 
harmony; his major interest is in the artistic facets of life; each unique 
experience is encountered from the point of view of grace or symmetry. 

Social. The social type of man’s highest value is altruistic love or phi!* 
anthropy. 

Political The political type man is characterized by his dominant interest 
in power, influence, and renown. This man does not necessarily limit or 
engage in political activities per se; his power needs can be channelized in 
all sorts of activities and vocational pursuits. 

Religious. The religious type of man is mystical, concerned with the cosmos 
as a whole and relating himself to its embracing totality. 

The actual test is designed for college students and adults (with at least 
some college education) and consists of items based upon familiar situations 
to which alternative answers may be chosen. There is no time limit. The 


authors suggest three basic uses for the Study of Values; 


1. Counseling, vocational guidance and selection. 

2. Classroom demonstration. As a teaching aid in beginning courses 
in psychology, education, and so <m. 

3. Research. For investigation of group differences, changes in \-aIues. 
comparison with other attitude and interest scales, resemblances 
between peer groups and famify, and so forth. 


Tu'O sample items with directions from the Study of \’alacs follmv; 

[Pari I If the subject agrees with altemativc (a) and disagrees \\ith 
(b) he places 3 in the first box and 0 in the second box or vice versa. 

rtf the Studv of Values as an stu'tucie jmenror\- « 

3 m fdS in «mmon .»iih «iter«t and pcrsnnai.ty imtrument, 

I. .. m t„. . of 

all three t>T>es. 


e 




342 


FiRurc 33. Diagnostic profile for Survey of Study Habits and Attitudes. (Reproduced by permission. Copyright 
© 1966, The Psychological Corporation, New York, N.Y. All rights reserved.) 



PERSONALITY, ATTITUDE, AND INTEREST INVENTORIES 


343 


Profile of Values 



Figure 34, Average profiles of males and females on the Allport-Vcmon-Lirdzcy 
Study of Values. (From Allfort-Vtrnm-Liiidsiy Study of Vu!aa. (3rd rf.) by 
Gordon W. Allport, Phillip E. Vernon, and Gardner Lindrey. Copyright © IPW 
by G. W. Allport, P. E. Vernon, and G. Lindaey. Reproduced by pemtjsaion of 
Houghton MifRm Co.) 


If there is only a slight prefcreijce for one over the other, they are 
rated 2 and I respectively.] 

Example^ 

When witnessing a gotgeous ceremony 
(ecclesiastical or academic, induction 

into office, etc.), are you mote im- b 

pressed: (a) by the color and pageantry [— 1 T 

of the occasion itself; (b) by the in- IjJ 

fluence and strength of the group? 

fPorr 11 The subject is asked to rate four possible attitudes or 
'“Jmder oi persona. P" ^ 

attractive and 1 to the least attractive altemame.J 
Example : of discovery such as Columbus’s, 



^ GROUP STANDARDIZED TESTING 

, b. they add to our knowledge of geo- 
graphy, meteorology, oceanography, 
j etc. 

i c. they weld human interests and inter- 
1 national feelings throughout the 

' world 

d. they contribute each in a small way 

to an ultimate understanding of the 
j universe. 

Scoring is an easy task because no key other than the simple instructions 
on the detachable page of the test booklet arc required. Total scores on the 
six values are plotted on the profile presented on the last page of the booklet. 
(See Figure 34.) Final scores are reflective only of relative trends because it is 
impossible to obtain high or low scores in all areas. 

Interest Inventories 

Each person makes a variety of decisions regarding the type of activities 
in which he will participate. Some indtv'iduals show preferences for sports, 
and others spend their time in reading or pursuing such hobbles as building 
model airplanes. Thus each person shows a preference for some activities 
and little interest or even aversion to others. Measuring these tendencies to 
like or dislike certain activities is the main objective of the interest inventory. 

Interest inventories are administered in the schools because it has been 
found that interests are related to academic success, job satisfaction, and 
eventual adjustment to and pleasure in adult life. For these reasons, it is 
important that every student have an understanding of his relative degree 
of interest in various activities. The ojunselor, in helping an indiridual 
student, wants the answers to such questions as, are the interests of 

this pupil? How does his interest in science compare to his interest in social 
actirities? How does his interest in a certain type of activity compare to 
those of other persons? 

But why give interest tests when all one has to do is to ask the person his 
likes and dislikes? After all, no one knows John as well as John knows 
himself. John may state that he likes arithmetic or wants to be an engineer, 
but such expressions of interest are of limited value. Authorities investigating 
this problem suggest that “single” expressions of interest may be unreliable 
and lack permanence. An individual’s statement that he is interested in 
being a fireman may be true when he is a certain age, but not necessarily 
true later in his life. People’s interests are influenced by many factors, and 
their expressed interests may not represent their true desires and needs. 

An adolescent from a middle-class community who is asked whether he is 
interested in going to college may answer affirmatively because he feels 




346 


GROUP STANDARDIZED TESTING 

mathematics, it was assumed that engineering was a good vocational choice 

for him. ,.. 1 . 1 , 

After the test results were presented to Mr. Snow, he admitted that he 
was unhappy in his work and missed the opportunity to work with people. 
After a series of counseling intcr\’icws, Mr. Snow talked to his employer 
and asked to be given a chance in engineering sales. His employer granted 
his request, and today Mr. Snotv is much happier in his vocational situation. 
A year after the inter^'^e^^•s he said, “I am a new man in this work. For the 
first time in my life, 1 can be happy when Monday comes and not dread 
going to work.” 

Of course, not all people who rely on self-estimates in selecting their 
occupations arc unhappy with their choices; some arc very' happy. Self- 
estimates are, henvever, often poor indicators for future occupational place- 
ment. This is the reason the counselor attempts to look at more than ivhat the 
individual says he wants to be or do in life. 

As we have mentioned, a pcr5on*s stated interests may not alwas'S mirror 
his true feelings. Thus professionals in this field construct their instruments 
to ask a wriet}* of questions concerning the person’s likes and dislikes. 
Questions such as whether he would rather read a book or go to the mories 
arc asked rather than whether he would rather be a lawyer or a teacher. 

In the following discussion of Interest inventories tve devote our mijor 
attention to the most widely used and respected instruments, the Strong 
Vocational Interest Blank and the Kuder interest inventories. We aUo 
briefly res-iew a newer instrument, the Minnesota Vocational Interest 
Inventory, which is a departure from the classical models. 

Strong Vocational Interest Blank 

The Strong Vocational Interest Blank (SVIB) was first published in 1927 
after several years of research; it has the longest continuous historj* of any 
widely used inventory. The inventory' has been revised twice (1938 and 1966) 
since its iniual publication. 

The basic purpose of the SVIB is best described in the following intro- 
ductory' statements from the Manual (Campbell, 1966b), 

Men in different jobs have different interests. The Strong Vocational 
Inlerett Blank (S^TB) is a device to identify such differences among 
those occupations that college students usually enter. The S^^B 
accomplishes this by providing an index of the similarity between a 
person’s interests and those of successful men (or women) in each 
of a wide range of occupations. 

The Strong Blank is designed to help g;uide the student and the 
employee into areas where they are likely to find the greatest job 
satisfaction. It is emphatically not a measure of general or sperific 
^ abilities, including intelligence. Such traits — which are probably 



PERSONALITY, ATTITUDE, AND INTEREST INVENTORIES 347 

I more related to a man’s perjomanee on a job than to his satufaction 

I there— should be determined by other means and considered along 

I with interest measures and personal experiences in planning future 
4 careers [p. 1}. 


The S VIB is mainly middle-class in orientation in that it is most appropriate 
for the professions and business and social-service occupations. It is not 
usually administered to noncoilege people because of the item content and 
research data, which have been structured and focused on men and women 
who have had some college education. Tlic Manual (Campbell, 1966b) 
recommends that “with people under 17 years of age, the test should be used 
only with relatively mature boys and girls" (p. 2). 

Administration of the SVIB is relatively simple. The individual is given 
the booklet and an anstver sheet and told to read the questions and respond 
to them according to the dircaions. Most people take from thirty to fifty 
minutes to complete the inventory; there is no time limit. There arc 399 
items. The SVIB may be scored by hand, although this is a laborious task, 
or by four different test-scoring services. 

The men's form was revised in 1966, and comparable research for a future 
revision of the women’s form is now underxvay. Let us direct our attention 
to the men's form as illustrative of the basic content and format of the S\’JD. 

The booklet is divided into eight sections. The first section (100 items) 
presents items (occupational titles) such as those in Figure 35, to which the 
subject is asked to respond by indicating one of three possible responses, 
"Like," "Dislike," or "Indifferent." 


1 Acror 

3 

5 ArtiH 

31 low7««.Cr«"l"ot 

5? lp*r^.C»'pe'»’'e*' 

53 l-bw^o 

5« l.VInH'.a'tftSaitV'on 

35 

6 

B A'ja>oneif 

TO 

5» ikVxlwfnt 

S7 

53 »w*u’a<Tv«tf 

5? H-jASdHwl N»«.po* 
to 

11 

1? Au^tsKti 

13 

U Airptan«r;W 
tS Sontrrl/*' 

41 

4} Vut*<«'v 

43 

44 r«»Ow*«o »' ( 

45 OH<« j 


Figure 35. of hy?d»r/J 

rom Manual/or hron; I otatmal Int J of the puhlishet*. 

<. SlrooB, Jf,. rcyUcJ by m'lJ ?■ ,9o6bvrte BoiiJof Truitmc/ ibt 

jtanford UnivcrsityPrcss. CopingM© • 

vcland Stanford Junior UnlwrsUy.) 




348 


GROUP STANDARDIZED TESTING 
The other seven sections and their contents arc as follow's: 

Section 2 (thirty-six items). School subjects, 

Section 3 (forty-eight items). Amusements and hobbies. 

Section 4 (forty-eight items). Occupational activiti^. 

Section 5 (fort>’-seven items). Feelings tmvard different lands oi 
people, for example, “military men,” “energetic people. 

Section 6 (forty items). Student is presented with ten items and 
requested to select the three he least prefers. ^ ^ ^ ^ 

Section 7 (fort)' items). Student is presented with pairs of actmties 
and asked to indicate in each case which he prefers or he may state 
he likes both. 

Section 8 (thlrtj'-nine items). Student is asked to respond to such 
statements as “am always on ume with my work” or “win friends 
easily'” by checking “Yes” (it is true of me), “r” (cannot say), or 
“No” (it is not true of me). 

There are fifty-nine scales for the men’s form; thirty’-four scales 
women’s form. The student’s scores are presented on a specially devised 
profile form. Figure 36 sho^vs the men’s profile form, illustrating the scales 
with an actual profiled case. 

High scores are those over 43 and low scores those under 25. To clarify 
the use of the SMB in an actual school situation the following case from the 
Manual (Campbell, 1966b) is presented:’ 

n This 18-year-old college student sought counseling near the end of 
|| his freshman year to determine whether “1 should continue with or 
I < change my major,” which was engineering. He was maintaining about 
|1 a C average but complained of being bored by the courses. 

I> In high school his favorite subjects had been mathematics and 

tl sdence, his principal avocation music (he was an accomplished 

Ij pianist). His mother was a free-lance writer, his father an artistic 

Ij person who had been variously engaged in interior decorating and 
I window display work- 

On the College Entrance Examination Board, using national norms, 

{ he ranked at the 27th percentile on the verbal test and at the 90th 
\ percentile on mathematics aptitude. On a test of mechanical compre- 

I hension, he ranked at the 60th percentile of engineering freshmen; 

■ on a test of spatial visualization, at the 93rd percentile. On an art 

\ judgment test, he scored on the 54th percentile of art students. On 

j the Minnesota ^lultiphasic Personality Inventory, he had a peak of 

’ Reprinted from Mamialfor Strot^ Vocatiomd Interest Blanks for Men and 
by Edward K- Strong, Jr., revised by Dav-id P. Campbell, with the permission of the 
publishers, Stanford Unhersity Press. Copyright 1959, 1966 by the Board of 
Trustees of the Leland Stanford Juiuor University. 


GROUP STANDARDIZED TESTING 

76 on masculinity-feminity, suggesting more ‘’feminine’' interests 
and attitudes than usual, especially for engineering students. ^ 

His profile on the Strong Vocational Interest Blank (hig. Joj 
yielded high scores scattered through several groups. He showed very 
strong resemblance to engineers and chemists, architects, physiaans, 
and musicians. In discussing the results, the counselor pointed out 
that though engineering seemed compatible with his interests an 
aptitudes, perhaps it might conflict with some of his valu« and 
personality traits. The client obscrv'cd that engineering “isn’t ver)* 
creative” and that he found many fellow engineering students rather 
unexciting personalities. He inquired about physics as a possible 
major and was advised that his profile gave little reason to suppose it 
would be better for him than engineering. The counselor sugg«tcd 
that he consider architecture and provided him with occupational 
information in this field. 

In his evaluation of the SVIB profile, the counselor judged that the 
high scores in artist, architect, and musician represented some 
creative-artistic element in the client's make-up and felt that the My 
scores on both the SVIB and the M.MPI tended to support this 
interpretation. The client had had excellent exposure to music and 
had rejected it as a career. Neither had he shown any willingness to 
pursue the biological sciences. The counselor fell that architecture 
might fulfill his creative-artistic needs while calling upon his engineer- 
ing background, his superior mathematical aptitude, and his talent 
for spatial wualization. The client tried courses in architecture, 
liked them, and transferred to that department. He received his 
bachelor’s degree in pre-architecture with a superior record despite 
his mediocre scholastic aptitude score. He was offered an apprentice- 
ship w’ith an architect of imemational reputation on the basis of some 
J sample work and was thoroughly pleased with his vocational decision. 

This case, of course, provides only one small illustration of how the 
SVIB is used to help students. The profiles are never (or at least should not 
be) handed to the student. In a one-to-one setting, the counselor should 
explain the meaning of the scores interpreted in the light of other data and 
should counsel accordingly. For a more definitive explanation of the SVIB 
in terms of content, counseling uses, and technical data, see Campbell, 
1966b, and Super and Crites, 1962. In addition, the studies by Harmon 
(1968) and Dolliver (1968) deal with the de\’elopment and possible limitations 
of the SVIB. 

The long history of the S\TB has provided an opportunity to compare 
data on men tested in the 1930’s with that on men tested in the 1960’s. 
Some reports on these studies have already been published. For example, 
one study was a comparison between the S\TB profile of 100 bankers who 
took the inventory in 1934 and a similar 100 bankers (same jobs and exactly 



PERSONAI-rrY.ATrmroE, AND INTEREST mVENTOmiS 351 

patterns to those of the 1930 s { mp . ^ comparisons states 

Campbell (1968) in a review of some of the thirty yea 

four main findings: 

1 several items on the STO show shif« in popularity in the las. 
^/SSlemayheeonstro^ 

collar and outdoor iLavn«a. ^ j jciivitics while 

4. General shift f artless popular. Interests in 

outdoor and sk.Ued tmd^ 
other areas seem to be at aoo 


Ktider InleresI Inventories ^ 

The firs. Kuder Pr=f«'"“ Sd seven 

^'ot1)r«fifi«”f" 

SXd agh ^ ,,3ed u d^tr-m 

The third form (Pt that almost c\cr> necause users 

-f„rsr O^or. t-H was ineluded in 
S‘’c'{kuSer,T964^^ j- ^,c been leS^^ 

resemblance to (K“.'>''-,,'^7„etv seoring technique 

bulk of our attention for 
nersonncl. 



352 


GROUP STANDARDIZED TESTING 


Kuder Prefe3ience Record Form C 

This inventory has been widely used in the schools for a good many years. 
It is usually given in the ninth or tenth grade. The inventory consists of 
168 questions. Each question is made up of three choices of activity, to which 
the person taking the test must respond by choosing the one he likes most 
and the one he likes least. Figure 37 illustrates the actual set of instructions 
and sample problems. There is no time limit. The average time for high 
school students has been found to be around thirty minutes to one hour and 
for college students approximately forty minutes. Scoring may be done by 
hand if pins are used (Form CH, see Figure 37). The directions are easy to 
follow and students may score the inventory themselves. 

There are two profile sheets available for use in interpreting the nine 
scales and the Verification scale. One profile sheet contains the norms for 
boys and girls in grades nine through twelve and the other presents norms 
for men and women. The examiner first checks the Verification score to see 
whether the inventory has been carefully completed. Next, he locates the 
highest score, the two highest, or in some cases even the three highest. 
(There are many possible combinations— even no high scores). He then 
consults an occupational table which lists the scales and the various possible 
professions and vocations under that scale. For example (Kuder, 1960), 

Scientific 

Professional: 

Chemist 

County Agricultural Agent 

Dentist 

Semiprofessional : 

Aviator 

Weather Observer 
Clerical and Kindred: 

Physician’s Assistant 
Protective Service: 

Detective 

Form may be used to help youngsters ^identify occupations they are 
interested in for further study and as a check on their choice of occupation 
to see whether it is consistent with the type of thing they ordinarily prefer 
to do. 

Kuder E Gexeral Interest Survey 

This instrument, according to Kuder (1964), was constructed in response 
to the need for an interest inventory to be used with younger students, 
especially at the junior high level. Language was kept to a sixth-grade level- 
In addition, other innovations w-cre developed such as an improved scale 
for finding careless responses, lack of understanding, or faking. On the 





KUDER PREFERENCE RECORD 

VOCATIONAL 

(ORM CH 




ng " , f, ,'iiir! in 



ou like le«n a'"J ^ ^ ' .« „f ihree itti' **'" *'* 

^ Iniheeecondpoupef 

,l„„Kli,ll« h.K..to<.<*i»'il-' •»"» 

but1«(a>«* Ua«l- CXAMPUS M 

■ 

Viiil 4tt art g*li*nr 
Btowii Jn a library 
VUit a tBiiiauO 



ome icl.vtlio «"*y ■ . .. 

n, ne e*""®’ 6 

roups. Olhet"'*^ 

onMenlial. 




.,n 1 . l-p' 




do no! ‘peM * i» »"rlo“» ” , i - 

„e 3 . 

ational (From J p^eric Illinois.) 



354 GROUP STANDARDIZED TESTING 

technical side longer scales were constructed because younger student’s 
responses tend to be less reliable. 

There are 168 items on this form. The instructions and format are very 
similar to Form C. Figure 37 also serves as an illustration of Form E’s 
general instructions and items. It generally takes forty-five to sixty mmutes 
to administer; however, there is no definite time limit. Hand scoring is similar 
to that of Form C. 

The student can score and plot his own profile by following the directions 
on the Profile Leaflet provided. The first step is to check the Verification 
score. Students whose scores on this scale are under a certain number (15) 
may proceed to construct their profiles. Scores over the prescribed number 
indicate that the student’s responses may not be valid. 

Figure 38 presents a portion of the Leaflet for grades 6 to 8. The Profile 
Leaflet also contains an interpretation of the various scales. A major portion 
of this interpretation follows. Note that it is directed to the student to help 
him understand the meaning of his scores. 


Interpreting Your Interest Profile® 

You are interested in something if you enjoy doing it. Your interest 
profile indicates whether your interests in the ten areas measured are 
high, average, or low compared with those of other boys or girls at 
your grade level across the nation. 

A score above the top dotted line in any column is a high score, 
j It means that you have indicated a preference for activities in that 
area more frequently than most young people at your grade level. 
(The percentile on the same line as your score for an interest area 
tells you what percentage of students expressed preference for activities 
in that area less frequently than you did.) A score bet%veen the two 
dotted lines means that your interest in the area represented is about 
average. A score below’ the bottom dotted line is a low score. It indicates 
that you have not expressed preference for activities in that area as 
often as most young people. 

Like most people, you probably have scores that are high in some 
areas, low in some, and average in others. Looking at all your scores is 
important, because most school subjects and jobs involve a combination 
of two or more interests. 

The more interested you are in a school subject, a job, or anything 
you do, the greater your chances are for success in it. It is easier and 
more satisfying to put your efforts into activities you enjoy than into 
those you dislike. Of course, no one can do only what interests him. 

• Fr^ Profile Uajlet Grades (W, Kuder E. © 1963, G. Frederic Kuder. Re- 
pennission of the publUhn, Science Research Associates, Inc., Chica^, 





HOW TO PROFILE 
YOUR SCORES 

.com. foUo«- «•* *«»«“« ^ 

full,, mini iht !>«'''* ^ ‘f! 

m <«.. «rfl lU 


v^m-nur 

« (rom P"*' -Mn S' 


-'SSrrr- * 

**" '* '^rsLlwl Vt»- » '•"* 

brrm*t <«•■*"•"'* 

(h« bctcoi" 

,. vvi.i.p-'.“«‘'”'“:,Tr;i 

,od <M 

Ixxwm 

Now 700 *“** • tin- wp »' 

l^c ctan 0( >•»«« '*** 


Kudc, G™e«l .te^ub^.hc, 

cc Research AssocwtM, in 35S 



GROUP STANDARDIZED TESTING 

'5S=S-4-3:|=r= 

interests in certain areas. Imagine. ,vho 

family and friends arc not particularly interested in tnu , “ 

has not had an opportunity to learn to play an instrum , 
auent^. to records, or to go to concerts. He may not score as h gh 
in musical interest as someone ivho has had more 
music. You have to be introduced to or discover an . 

you can like it or dislike it. Participating in something 
you might like may in turn tend to strengthen your inlere 
As you mature and are exposed to a variety of neiv 
of your old interests may change and some nciv ones may P 

High interests are not bitter than low interests; nor is one interes 
bette;-or ,vorse-than another. WTiat counts is knoiving s' ha' 
your interests are and considering them whenever you have 
important educational or vocational decision to make. r^iral 

Here is what the ten interest areas measured by the huder Cmcf 
Interest Surrey mean. . . 

Outdoor interest means preference for work or activity in P 
you outside most of the timc-usually work dealing unth ^ 

other growing things, animals, fish, and birds. Foresters, natura i . 
fishermen, telephone linemen, and farmers arc among those hign 
outdoor interest. _ . 

Mechanical interest means preference for working with 
and tools. If you like to tinker with old clocks, repair broken objects, 
or watch a garage mechanic at work, j'ou might enjoy shop courses in 
school. A\iator, toolmaker, machinist, plumber, automobobile repair- 
man, and engineer are among the many jobs involving high mechanica 
interest- . -.l 

Computational interest indicates a preference for working 
numbers and an interest in math courses in school. Bookkeepers, 
accountants, bank tellers, engineers, and many kinds of scientists are 
■ usually high in computational interest. . 

Scientific interest is an interest in the discover)' or understanding o 
nature and the solution of problems, particularly with regard to t e 
ph)'sical world- If you have a high score in this area, you probably 
enjoy working in the science lab, reading science articles, or doing 
science experiments as a hobby. Physician, chemist, engineer, a o 
rator)' technician, meteorologist, dietitian, and aviator are among t e 
occupations involving high saentific interest. 

Persuasive interest is an interest in meeting and dealing with people. 



personality, attitude, and interest inventories 


357 


p 

i 

i 


INALlli, 

i„convinc;n5 0.he.of.he^^^^^^ 

i„ promo..ng projcc ” =tive in.eres,. If you have a 

managers, and buyers h mv raioy such actniues as debating, 

injeates^j;;^- f; 

with the hands— usually dKorate a room, design clothes. 

If you like to paint, b,bly high in this interest 

SVS;: drei d'esigners. architects, hairdressers, and 

interior decorators. ^ j „,ri,ing. Persons with 

ir'iernry interest IS an inte editors, 

literary interest tnclude "“'“f ’ a high score on the literary 

news rVo«ets. -“"^e 15 ^Zr favoritcLbjects, and you may 
scale, Englisl' pro y magazine, 

enjoy wniting for the school P P , j by persons who enjoy 

Musical interest usually ts sinjng. or reading abou 

bookkeeper, rnanager fall „„es; nor 

commercial subje , j|j,.ussed here are example. 

The ten interest arras possio t 

is the classification syste “a“Pa'‘™= s„as described, 

interests may be ^a ^ Situations). The m j^^^uus 

for certain kinds of p most to J 

however, are ih' broad fields “f'™:S, ,°r some occupations 

about school ,„cst areas are much h'g reporters are at 

Scores in related i ij_ authors, ad' • music teachers 

than for ■'•>'''=■ r“.„'^rSry interest, '^f' thanics and repairmen 

the 97th percentile '"'i„ musical interest- mterest- 

are at the 95* P“^ at the 65th pet«ntile j„ atientific 

on the other hand, “urgeonsareat the 7ath p 

their highest score; and so geo 

interest. 


I 



GROUP STANDARDIZED TESTING 

Some occupational groups have scores nearly as high— or higher— 
in apparently unrelated interest areas as in related areas. In one 
survey the highest percentile for lawyers and judges was 82, on the 
literary scale. The next was 61, on both the musical and the clerical 
scales. In another survey the second-highest interest for surgeons 
was outdoor interest, with a percentile of 68. Their next-highest 
interest W’as literary, with a percentile of 66. The main reason for 
results of this type is that people often have more than one strong 
interest. They may go into a career that makes use of their combined 
interests, or they may direct one or more of their strong interests into 
a satisfying after-work activity that provides a change of pace and 
broadens the range of their activities. Businessmen may find relaxation 
and a chance to get away from people and pressures in such activities 
as hunting and fishing. One retail salesman, for e.xample, may enjoy 
many do-it-yourself activities involving high mechanical interest. 
Another may enjoy fishing or gardening or acting in a community 
theater. A businessman may, in his free time, play in a string quartet, 
paint or sculpture, or engage in some other kind of activity not clearly 
related to the work he performs for a living. 

Knowledge of your interests can tell you only what you enjoy 
doing; it cannot tell you how well you do these things. What you do well 
depends on many things besides interest — particularly, your abilities. 
Your counselor can help you find out whether your abilities measure 
up to your interests. He can help you \v*ith your decisions about what 
course of study and school subjects to take. Your counselor may also 
be able to suggest ways in which you can explore and broaden your 
Interests — extracurricular activities you might enjoy, books appropriate 
to your interests, and kinds of part-time or summer jobs you might 
want to consider. At various points during your school years — es- 
pecially before making plans for college or a job — you may want to 
reexamine your interests. Your counselor can suggest other Kuder 
interest inventories for this purpose. 


The Survey is not intended to be the basis of vocational choice. It is 
intended as background information to be utilized in the whole process of 
choosing a career. In the seventh and eighth grades the Survey can help 
students choose electives, high school courses, and areas of study, for example, 
college track or commercial track. In the ninth or tenth grade the Survey 
provides the opportunity for the student to re-examine his interests, plan 
his high school program, or, if he has decided to drop out of school, help 
in choosing immediate vocational pursuits. 

The relationship of Form E to Form C was studied. Correlations were 
obtained ranging from 0.69 to 0.82 for boys, with a median of 0.76; and from 
0.65 to 0.86 for girls, with a median of 0.79 (Kuder, 1964). For definitive 



PmONAlmr, ATTITUDE, AND INTEEIST INVENTOWIS 33, 

Kdder DD Occupational Interest Survey {0!S) 

This mter^t inventory is primarily geared for high school juniors and 
seniors to help them make immediate vocational or educations} choices. It 
also may be used with college freshmen and adults in employment counseling 
placement centers, and Job Corps ty^ programs (Kuder, 1966b). 

The format of the items is w/y similar to that of the other Kuder in- 
ventorics. There are 100 sets of three activities each, and the student is asked 
to indicate his preference by marking one "most” and one "least" for each 
set. The scores are reported and plotted on a special table. See Figure 39 
for a sample report. The results show the degree to which the student's 
preferences resemble those typical of people in various occupations and areas 
of study. About 80 per cent of the people in the occupations and areas of 
study listed (satisfied with their vocations) obtain scores of 0.45 or more on 
the scale for their job or area of study (Kuder, 1966b). 

An important feature of the OIS is the fact that scores for it represent the 
correlation of a person’s interest with those of people in a number of specific- 
ally defined occupations. (The person’s scores for these occupations are 
compared directly with each other.) This is a departure from other Kuder 
inventories, which are based on a general reference group. (See discussion 
on technical aspects of inventories at the end of this chapter.) In addition, 
the verification score is improved and should lead to the identification of 
faking or misunderstanding. 


A Final Word on the Kuder Inventories 

A common denominator of all the instruments is the forced-choice format— 
the student must answer "most” or "least” to three possible activities. They 
all share a heritage of many years of dedicated research and anaJj-sis since 
the publication of the first inventor}' in 1939. With all the research done by 
Kuder and his associates as well as by independent investigators, it is of 
interest to note the following caution in inteiprcting the results of any of 
the Kuder instrumeoR (or any instrument, for that matter): 


Factors other than ability may make it unfeasible for a subject t ogive 
serious consideration to an occupation or field of study in which he 
has high interest scores. For example, a student with a high srore on 
the Farmer scale, for whom moving into a rural arw ts highly un- 
rMlistlc. should not regard farming as a good po-!sibiht)- for km to 
exnlore In any case the student should not fee! pressured to pursue 
investigation of a particular field, regardless of how htgh his interest 
ZTeslr how great his abilities in that field. The pnnapal purpose 
of his scores is to point out promising possibilities for future oceupa- 
lions or studies, from the point of view of h» oim pattern of micrtsis. 



Kuder Occupational Interest Survey 



360 


llcscatch Associates, Inc., Chicago, Illinois.) 


PERSONALiry. ATTITUDE, AND INTEBEST INVENTORIES 


361 

The scores should help him make decisions by suggesting a variety 

p.4]. 


Minnesota Vocational hurat Imnlory 
The Minnesota vocational inter.! 

tlon on the interest pattcriis of men m p / and other 

ded as a guidance tool for “““^Xtliskilled and skilled occu^ 
individuals who arc '“'“"j^dual scores represent the similar.^ 

tions (Clark and Campbell, 1965). Ind a ^anety 

of the interests of the person taking the MVll 

of nonprofessional occupations. parson responds by 

The MVII has 15S a„d the item he would like to do 

choosing the item he would hk' “ ^ Kuder Form C in terms of the 

7n'’S‘r"mttcs(C,arkandCar„phell,^ 

Practical Intplicattons of Intecest I ^atween interest and ability 

haSn^ 

achievement in a hald . ^^Id will show some inventory can be 

those of high ability m a ^ state that an in a achievement 

this relationship 'a“°‘ „„ the other hand that “ b IT .^^armation 

used to detennine abd ty • must have b“t W „member that 

Let us examine an actua | ,, Education H g ^ a career. 

His highest Interest scores are 



7i»|g[gESofA VOCATIONS 



362 





364 


GHOUP STANDARDIZED TESTING 


rev.e.-ing .he resnU. of .he " 

and achievemen. .es. scores as well as his class re voca.ion. 

SsiSHSSif 

.he possihiliQr of becoming an arehi.ec., an arfis., or a feache f rt. I 
addition, he poinB on. occupations at the sem.profess.onal “'f “ . 
ren^e a coUege education, such as draftsman, decorator, and taxidermist 
He does not however, go into other fields that do not require even a high 
School education, such £ upholstering or t=«“ti"g. Although thepb is sril 
far from simple, the interest test has narrowed the field, ^ \a'ne 
and Jerry can now concentrate on the occupations requiring m 

“‘^Hewdnfthe interest patterns of a student, .here are .attain fa^ m 
addition to those already mentioned that one must bear m ’ 

interest patterns generally reveal themselves m mature chddren at the ag 
of fifteen or sixteen. However, some develop definite interests 
as age twenty-two, and others may never develop these patterns, ec i 
interest patterns generally seem to be established in a person e 
has had a chance to have extensive occupational experience, ihira, 
because a person has certain interests, we cannot say definitely that he wui 
be successful in the areas of his interest. Other factors, such as abilit}, 
must also be considered. It is seU-evddent, therefore, that interest 
patterns cannot predict school achievement. Fourth, interest scores may 
predict the relative happiness or feeling of satisfaction a person may 
receive from certain types of work. Fifth, most people may be satisfied m 
many different types of schoolwork and in a number of different jobs. Sixt , 
if a person wants to fake his interests, he can do this easily. The Kuder has 
keys that sometimes reveal faking, but they are not foolproof. Of course, 
most people seeking vocational guidance tend to be honest in their ans\\ ers, 
at least at the conscious level. And seventh, motivation and personality may 
enter the picture and distort the meaning of the interests of an individu^ 
as re\’ealed by an inventory (Carter, 1949; Strong, 1943, 1955; Supej an 
Crites, 1962; Darley and Hagenah, 1955; Campbell, 1966b, 1968; Kuder, 
1964, 1966a). 

The student of measurement should remember that interest inventon^ 
are only systemized sur^'ej^ of what a person is interested in. Don't forget the 
person. Some people do not need to have an interest inventory to point out 
their vocational choices. They arc sure of what they want and where they are 
going. Whether they are right or wrong is not the point; they have a right to 
make their own decisions. Interest inventories should not be given on the 
basis that “it’s good for him” but on the basis of the needs of the individua 
student. (For an excellent discussion of some of the practical problems m 
interest measurement, see Wdtz, 1968.) 


PERSONALITV, ATITTUDE, AND INTEREST INVENTORIES 365 

Mo‘srarthe''m™"r'innov:itlons in taerest inventories ha™ ^ed 
on empirical data. There data 

t: E 

both been intensively invest^ • . .Qgj^Qents. The interested 

credentials in terms of otact ''ahdily an a„d other references 

student may pursue this by refeir g thj however, that in general 

previously cited in this ehapte _ I ” reliability for teen-age 

both yield scores that are ac«p , „eeal reliability coefficients 

individuals or older persons. These . men 

that arc fairly comparable m a 1 1 > j interest inventories is judging 

The main task in apP-aising the validityj^^a 

the honesty of the of appealfordetermining a persons 

state. "There isn't really “"/ higher «urt oniTstatement" (p. 395). There is 
likes and preferences than 'h= "'h ™'”' , bn, ss tve have stated before 

no doubt that a person can fake his cs^ ^ conscious distortion. 

ifhe comes of his cum vohtion the eista^ ^he oeeupa- 

I„ the area of for example, devised to distinguish 

tional-interest scales for the Sl'/ng ' ^^j^^ral. Percentage overlaps 

members of occupational group . They range from 15 to 5-, tvi 

To? each scale on SVID and lbto42, vvi.hameffian 

ventoties attempt to d, St, ngutsh oe ,uid|,v „f 
"sre" in ' ^Tth^on <ic whL both the Strong and 


interest scores in „„ ,he whole botnt, 

deal of accumulated mndence „f weational choice-^^^^ 

Kuder have some vah ?..,v rfthe Minnesota Vocati . ,j];c its place 

The validity and ™ ^ research before it can tak 


The validity a»u av..--- . _ 

is promising but must s ‘ dependent on 

next to the Strong and Kud^ another is P 

The selection of one interest 

in one di.t '"’^P 

"The percentage 

n^o'petc’iST^n-sxrss.Se-’'-"' 

JO See Findley (iww 



GROUP STANDARDIZED TESTING 

pupil needs, age, and occupational horizons. As a general rule, the 
is never given before the senior year of high school, whereas the Kuder 
inventories may be first administered in junior high school. 

If the teacher or counselor recognizes that the interest inventory is tentative 
and that it is not a measure of ability but only of declared interests and uses 
it along with other data, most students can profit from the experience o 
taking an interest inventory. 


References 

Allport, G. \V., Vernon, P, E., and Lmdzey, G. Study of Values : Manual. (3rd ed.) 
Boston: Houghton Mifflin, 1960. 

Buros, O. K. The sixth mental measurements yearbook. Highland Park, N.J.. 
The Gryphon Press, 1965. 

Brown, W. F., and Holtzman, W. H. Survey of Study Habitsand Attitudes: Manual. 

New York: The Psychological Corporation, 1966. 

Campbell, D. P. Stability of interests within an occupation over thirty years- 
Journal of Applied Psychology, 1966, 50t 51-56. (a) 

Campbell, D. P. Strong Vocational Interest Blanks : Manual. (Rev.) Stanford, Calif.*. 
Stanford Universit)* Press, 1966.(b) 

Campbell, D. P. Changing patterns of interests within the American society. 

Measurement and Evaluation in Guidance, 1968, 1, 36-49. 

Carter, H. D. Vocational interests and job orientations : A ten year revieto. Stanford, 
Calif.: Stanford University Press, 1949. 

Clark, K. E., and Campbell, D. P. Minnesota Vocational Interest Inventory: 

Manual. New York: The Psychological Corporation, 1965. 

Cook, W, W., Leeds, C., and Callis, R. The Minnesota Teacher Attitude Inventory. 

New York: The Psychological Corporation, 1951. 

Cronbach, L. J. Essentials of psychological testing. (2nd cd.) New York: Harper & 
Row, I960. 

Dahlstrom, W. G., and Welsh, G. S. An MMPI handbook : A guide to use in clinical 
practice and research. Minneapolis: The University of Minnesota Press, 1960. 
Darley, J. G., and Hagenah, T. Vocationalinterest measurement : Theory and practice. 

Minneapolis; The University of Minnesota Press, 1955. 

DolUvcr, R. H. Likes, dislikes, and SVIB scoring. Measurement and Evaluation in 
Guidance, 1968, 1, 73-80. 

Drake, L. E. and Getting, E. R. An MMPI codebook for counselors. Minneapolis: 
University of Minnesota Press, 1959. 

Findlej', 1\'. G. The Occupational Interest Survey. Personnel and Guidance Journal, 
1966, 44, 72-77. 

Finney’, J, C. A programmed interpretation of the MMPI and the CPI. Archives 
of General Psychiatry, 1966, 15, 75-81. 

Finney, J. C. Methodological problems in programmed composition of psycho- 
logical test reports. Behavioral Science, 1967, 12, 142-52. 

Fowler, F. M. Interest measurement questions and answers. School Life, 1945 
(December). 


PERSONALiry, AITITUDE, AND INTEREST INVENTOMES 36, 

SnmP in intetst mrasnrcment. 
merit ana Eialuation in Guidance, 19(58, 1, 6S-72 

Hathaway S. R.. and McKinley, J. C. Booklet > the Minnesota MuUipkasic 
New York: The Psychological Corporation. 1943 
Hathaway. S. R.. and McKinley, J. C. Minnesota MuUipkasic Personality Inventory ; 

ilfartua/.pev. ed,) New York; TTic p5)xhologic3J Corporation. 1951. 
Hathaway, S. and McKinley, J. C. Constniction of the schedule. In G. S. Welsh 

and \\. G. pahlstrom (Eds.), Basic readings or, the MMPI in psychology and 
tnediane, Minneapolis: University of Minnesota Press, 1956. Pp. 60-63. 
Hathaway, S. R., and Mcchl, P. E. .^n atlas for the clinical use of the MMPJ. 

Minneapolis: University of Minnesota Press, 1951. 

Hathaw’ay, S. R., and Monachesi, E. D. Analysing and predicting juvenile delinjueruy 
xtnth the MMPJ. Minneapolis: University of Minnesota Press, 1953. 
Hathaway, S. R., and Monachesi, E. D. An atlas of juvenile MMPI ptofiks. 

Minneapolis: University of Minnesota Press, 1961. 

Karmel, L. J. An analysis of the personality patterns, and academic and social back- 
grounds of persons emplt^'td as full-time counselors in stketed secondary schools in 
the state of North Carolina. (Doctoral dissertation, University of North Carolina), 
Ann Arbor, Mich: University Microfilms, 1961. No. 62-3134. 

Kuder, G. F. Adminislralor’s manual Kuder Preference Record VocatmaJ-^Form C. 

Chicago; Science Research Associates, 1960. 

Kuder, G. F. Kuder E General Interest Survey manual. Chicago: Science Research 
Associates. 1964. 

Kuder, G. F. Kuder DO Occupational Interest Surrey general manual. Chicago: 
Science Research Associates, 1966.(3) 

Kuder, G. F. Kuder DD Occupational Interest Survey Interpretive Ltafltt Grades 
11-12. Chicago: Science Research Associates, I966.(b) 

Meg^rgce, E. I. (Ed.) Reseerreh in cUnieal assessment. New York: Harper & Row, 
1966. 

Mooney, R. L., and Gordon, L. V, The Mooney Problem Check Lists: Manual 
(Rev. cd.) New York: The Psychological Corporation, 1950. 

Sprangcr, E. (translated by P. J. W. Pigors) Types of men. Halle: Niemeyer Verlag, 
1928. , , . 

Strong, E. K. Vocational interests of men anduomen. Stanford, Calif.; Stanford 
Universit)' Press, 1943. 

Strong, E. K. Vocational irtteresls IS years after college, hlinneapolis: University of 
Minnesota Press, 1955. , , , . , 

Super, D. E.. and Crites, J. O-Appraisingvocationalftness :By meansof psychological 

terfi. (Rev. ed.) New York: Harper & Row, 1962. 

Thorndike, R. L- Thorndike Dimensions of Temperament: Manual. New iork; 

The Psychological Corporation, 3966- . . , j 

Thorndike, R. L., and Hagen, E. Measurement and evaluation m psychology and 
education. (3rd ed.) New York: Wiley, 1969. 

Thurstone L L. Die meflrafeffrfff/o/evr/tfw. Chicago: University of Chicago Fress, 


WeiW, "h. Some practical problems in interest measurement. Measurement and 

\vS'”g r g! (Eds.)B»*.,a.Im,ion ra/W/n wrf-fcS’ 

wid'midim.. Muiiiwpolis: Universily of Minnoot, Press, I9jS. 



CHAPTER 

13 


Teacher-Made Tests 0 


Like it or not one must measure outcomes of instruction. It is true that 
the experts who specialize in test construction do not ah^-ays do a good job 
and that they have the assistance of subject-matter specialists to help them. 
We have seen the problems of validity and reliability in the construction of 
standardized tests. How can one, then, expect a classroom teacher who has 
had one course in testing to do a good job? 

It is true that no matter how much you read about classroom tests in this 
or other books, you will never conceive or produce a test that will really meet 
the ideal standards of test construction. However, you Viill probably do 
well enough for your own purposes, if you follow correct test construction 
methods. Reading the following material will not solve all your test problems, 
but it should help. Remember, just as the standardized test is a reflection of 
educational goals and procedures, so is the teacher-made test, but even more 
so because the teacher knows what the specific objectives are and what 
methods have been employed to reach them. 


Standardized Versus Teacher-Made Tests 

There should be no battle between standardized and teacher-made tests. 
Most schools use both types of instruments; each has its own advantages 

371 


teacher-made tests and grades 

Zl 25 Adv^tages and Wdations of Sundaadiaed and Nonsundardiaed Tea. of 

Achievement 

^ STANDARDIZED 

, , Lmitations 

r Advantaget 


1. Validity 

a. Curricular 


b. Sutistical 


2. Reliability 


3. Usability 
a. Ease of 

Administration 


Careful selection by competent per- 
sons. 

Fit typical situations. 

With best tests, high. 


For best tests, fairly high — often 
.85 or more for comparable forms. 


Definite procedure, time limits, etc. 
Economy of time. 


b. Ease of Scoring Definite rules, keys, etc. 

Largely routme. 

c F flif of Inter- Better teats have adequate norms. 

’ pretation Useful basis of comparison. Equiv- 

alcnt forms. 


Summary Convenience, comparability, objec- 

tiriiy. 

Equivalent forms may be available. 


Inflexible. Too general in “ 

meet local requirements tuiiy. 
especially in unusual situations. 

Criteria often iraPP™P™!' " T- 
reliable. Size of ? 

pendent upon range of ability 
group tested. 

High reliability is no ^ar^t« of 
validity. Also, «hability dc^nds 
upon range of ability' m gr 
Tested. 

Manuals require careful study an<l 
are sometimes inadequate. 

Scoring by hand may consider- 
able time and be monotonous. 
Machine scoring preferable. 

Norms often confused with stan- 
ards. . 

Some norms defective. 

Norms for various t>T« of schwu 
and leveU of ability are often 
lacking. 

Inflexibility. May be only sUghtjy 
applicable to a particular situs 
lion. _ 


From Julian C. Stanley, Mtamrfmfnt in Today's SchooU, 4th cd., 
sion of Prcntice-Hall, Inc., Englewood Cliffs, N.J. 


5 1964.Reprintedbypeno‘*' 

and disadvantages. Obviously, teacher-made intelligence, personality, 
special aptitude, and interest tests are not the question here. The tea 
question is when to use the standardized achievement test and w'hen to use 
the teacher-made achievement test. A general rule of thumb is that t^ e 
teacher-made test is altcayj used as the instrument of choice to appraise 
outcomes of local classroom instruction. On the other hand, the standardize 
achievement test is used to provide information to the teacher, school, an 
student in terms of how local achievement compares to national norms. 
In 'Fable 25 wc find a comparison of the relative strengths and weaknesses 
of the standardized and the tcachcr-madc, or “nonstandardized,” tests. 


Purposes of Teacher-Made Tests 

“The test is the message.” The teacher, in the most direct and meaningful 
manner, tells the student what he really thinks is important through his 


TEACHER-MADE TESTS 


373 


WONSTANDAJIDIZED 


Adtantagt 


UmitaUoni 


useful toi 

vantedctoesietod BimjE “ , 

language m.mng. 

Sd”"d? tablls. Ususlly "O' ‘‘"X’™' 


Extcnsiw sampling of 
subject matter. 

Flexible in use. Dis- 
<^rages bluffing. 

Compares favorably 
with standard tests 


Easy to prepare. 
Easy to give. 


Reliability usually 
rjuite low. 


Lack of uniformity. 


Slow, uncertain, and 
subjectiw- 


Mo norms. 
Meaning doubtful. 


Someiintesappmaehes 

that of standard 

Directions rather w 
form 

Economy of time. 
Definite rules, key*. 

Larcely routine. 

Can be done by clerks 
or machine. 


Marrow sampling of 
functions tested. 

Negativelearoing pos- 
sible. 

May encourage piece- 
meal study. 

Adequate entena usu- 
ally lacking. 

Mo guarantee of valid- 


Time. efTort, and skill 
are required to pre- 
pare well. 


Monotonous. 


Useful for part of sibjertn” 

tests and in a K consuming, 

special fields. _ 


Local norms can b 
dented. 

E.\tcnsi'c sampling. 
Objective scoring. 

Flexibility. 


No norms available at 
iKginning. 

Pftpiraiion requires 
ibll and time. 


that you will coromu instructional 

you give. „i,h „luable gents learning what 

Tests serve *e tracn ,\,e the St 

effectiveness. The tc ’'''n”’"' hTt with information on rn- 

I want them to 'f'"' „,osdde the i' o.csihl= to male 

,n addition, -t^tteLd of data .. is ata^^ brfp, or 

dividual progress. j. students. S ^ Eji Testing also aids 

decisions on P«"’S “ „iu„c« for the advanced pupi 

providing ' as a whole. , assigning grade’- "'"J 

in evaluating the el’” “ j „ „f tfte rime. The P’""' 

Tests, of eO'd’erfS' Cental shock at report ca 
properly they can forestall p 


XEACaiER-MADE TESTS AND GRADES 
374 t.*i4 

,vho has seen hU child’s test papers fairly well what grades h.s ch.l 

Revieting tests with parents and teachers can help the 

?^;lirr^:rs;7rHr:dTofp"^^^ student 

and work on his areas of weaknesses. 


Planning Ahead 

There you are, Miss Fry, your first month of teaching and you’ve done a 
pretty fah job so far. Now you have to decide how to evaluate wh^tlie 
Students have learned because you must turn in the first report card gra 
next week. “Now what kind of test should I give these kids . ^ 

Quality tests do not spring forth full blotcn. They’ are planned tn detail irnen 
vou detail the overall goals of the course. Questions dealing with the type 
test to use, the amount of material to be tested from text and classroo 
discussions, and so on should be decided before you begin classroo 
instruction. 


Defining Objectives 

Defining objectives ts not new to those of you who have been exposed to 
a “methods” course in education. You know that defining goals or objecti%es 
is a primary ingredient of the teaching process. Noll (1965) states the 
very cogently when he says, “To try to teach and c%’aluate without defining 
objectives is like starting out on a journey without knowing where to go- 
It may be pleasant to wander around for a while, but it is doubtful that any 
sort of progress can be made without some direction” (p. 104). 

General Objectives 

Usually there are two kinds of objectives, general and specific. For e.x- 
amplc, many years ago a group of educators set forth objectives that ha\e 
become known as the seven cardinal principles of education (United States 
Department of Interior, 1918). These general objectives are the following- 

1. To promote good health. 

2. To teach command of the fundamental processes. 

3. To provide for worthy home membership. 

4. To aid in the selection of a %'ocation. 

5. To offer civic education. 

6. To assure worthy use of leisure time. 

7. To promote ethical character. 



375 


teacher-made tests 
(Afk™ ''Sh'-y™-- ^w-iy 

1. The development of effective methods of thinking. 

2. The cultivation of useful work habits and study skills. 

3. The inculcation of social attitudes. 

4. The acquisition of a wide range of significant interests. 

5. The development of increased appreciation of music, art, 
literature, and other ewhetic experiences. 

6. The development of social sensitivity. 

7. The development of better personal-social adjustment. 

8. The acquisition of important information. 

9. The development of good physical health. 

10. The development of a consistent philosophy of life. 

Another excellent source of ideas for general educational objectives is the 
Taxonomy of Educational Oljeelives (Bloom, 1956, and KrathwohJ, 1964). 
The original conception of the taxonomy was to encompass three behavioral 
spheres — cognitive, affective, and psychomotor. The cognitive area covers 
objectives related to recall or recognition of knowledge and problem solving. 
The affective domain deals with changes in interest, values, and attitudes 
and the development of appreciation and adjustment. The psychomotor 
area covers objectives relating to manual and motor skills. Bloom's (1956) 
work covered the cognitive domain, whereas Krathwohl (1964) dealt with the 
affective area. Work in the psychomotor domain has not yet been published. 
Let us first, very briefly, review Bloom and his committee’s work as it relates 
to our concern for objectives. The committee made the following statement 
concerning their findings for the classroom teacher (Bloom, 1956)c 

] Use of the taxonomy can also help one gain a perspective on the 

emphasis given to certain behaiiors by a particular set of educational 
plans. Thus, a teacher, in classifying the goals of a teaching unit, 
may find that they all fall within the taxonomy category of recalling 
or remembering knowledge. Looking at the taxonomy categories may 
] suggest Id him that, for example, he could include some goals dealing 

) with the application of this knowledge and with the analysis of the 

1 situation in which the knowledge is used (p. 2j. 

The classification of cognitive educational objectives falls info six major 
categories: 

1.00 Knowledge. 

2.00 Comprehension. 

3.00 Application. 

4.00 Analysis. 

5.00 Synthesis. 

6.00 Evaluation. 



376 


teacher-made tests and grades 

These major categories are broken do™ in subsections, for example, some 
of the headings under Knowledge are as foUows: 


1,00 Knowledge. 

1.10 Knowledge of specifics. 

1.11 Knowledge of terminology. 

1.12 Knowledge of specific facts. 

1.20 Knowledge of waj-s and means of dealing with specitics. 

1.30 Knowledge of the universals and abstractions in a tie 

Under every heading there is a definition and discussion of the 
that heading. These are further clarified by illustrations of the fand of educa 
tional goals included in the specific category. In addition, at the end o 
of the sections there is a review of the types of test items ^t imy e 
to test achievement of the various facets of each objective. It is ig > 
recommended that the classroom teacher obtain a copy of Bloom s f ^ ) 
Taxonomy of Educational Objectives, an exceUent resource for preparing 
classroom units of study and tests. 

Kiathwohl (1964) and his committee’s work in the affective domain 
produced the following major categories: 


1.0 Receiving. Sensitmty to the existence of certain stimuli. 

2.0 Responding. Active attention to stimuli, for example, going 
along with rules and practices. 

3.0 Valuing. Consistent belief and attitude of worth held about a 
phenomenon. 

4.0 Organization. Organizing, interrelating, and analyzing differen 
rcle^'ant values. 

5-0 Characterization by a iwlue or i-alue concept. Behavior is 
guided by value. 

For a detailed discussion of these objectives see Krathwohl’s Taxonomy of 
Educational Objectives. 


Specific Objectires 

In the area of spedfic objectives we are concerned with specific 
subject matter. This means skills, concepts, facts, principles, and so forth. 
A good method of setting forth your instructional objectives is to outline 
them. 

The construction of objectives should not become laborious. Do not be 
overwhelmed by the process; use it to your advantage. This means construct 
objectives for units of study, tnduding e^•alualion methods, in a manageable 
manner. For example, let us look at a unit on arithmetic. 



TEACHER-MADE TESTS 


377 


General Directive 

A. To develop skill in the process of calculating two- three-, and 
four-digit problems in addition and subtraction. 

Specific Objectives 

A. i'iddition. 

1 . Problems requiring carrying (also called exchanging). 

2. Problems not requiring carrying. 

3. Use of decimal points. 

B. Subtraction. 

1. Problems requiring borrowing. 

2. Problems not requiring borrowing. 

3. Use of decimal points. 


Let us translate our “general*’ and “specific objectives” in arithmetic to 
actual test questions: 


ADOmON 


1. 34 

67 

834 

5468 

■h48 

4-26 

-f287 

+7679 


— 

■ 


2. 65 

15 

213 

8065 

4-32 

±ii 

•f38l 

+ 1924 

3. 4.2 

5.33 

7.88 

88.44 

+3.3 

-f6.il 

-fl.33 

+16.67 

susrnACTroN 




1. 30 

46 

634 

7586 

—19 

-27 

-488 

-5797 






2. 48 

S4 

975 

3345 

—32 

-63 

-960 

-2214 


— 

• “ 


3. S.7 

6.8 

8.43 

666.3 

-4.5 

-1.9 

-7.99 

-566.2 


Figure 41 presenls an ec.ual blueprint of a tea. derived from iosemorioua. 
objectives in social studies. 





teacher-made tests and grades 

Steps in Test Construction and Administration 

So far we have talked mostly in general terms Let us 8^ fj", " 

practical situations by outlining, step by step, the actual process 
construction and administration. 


“Get Ready” Stage. ^ t t#<t 

a. Secure all the instructional materials bearing on the ’ 

for example, books, notes, and content outline or unit objccmcs. 

b. List the objecti%’es you want your test to measure. 

“Get Set” Stage. 

a. Plan type of test to be used and the general format of the test. 

b. Write a preliminary draft of the items to be used. 

c. Plan the length of test. How long do you want the test to be the 

whole class period, or a shorter time ? _ 

d. Go over test items. Do this several days after you have wntten t e 

preliminary draft. Take out the items that do not seem relevant 
and polish others. j 1 re 

e. Arrange hems in order. Rate your items for difficulty and p a 
the easiest first so as not to discourage the students and to gi'® * ® 
poorer students some motivation to continue with the test. 

f. Instructions. Be sure you know what you u'ant the students to o 
and are able to communicate your wishes to them. 

g. Decide on the “Rules of the Scoring Road.” Be sure you ha>e 
written the correct answer before testing. This is as true for essay 
questions as for objective items. 

3. “Go” Stage. 

a. Time. NMiatever you decide upon, be consistent. For exampie> 
do not state, “You will have thirty’ minutes to complete the test 
and then give the slower students an hour. It is not fair to th 
others who attempted to complete the assignment in the state 
time. Generally allow enough time for at least 90 per cent of the 
students to finish. 

b. Physical Conditions. Try to maintain a quiet, disturbance-fre® 
room. Do not walk around the room, tap a pencil, talk to other 
teachers, and so on. 

c. Adnunistration. Do not be hostile in tone or manner when givmg 
directions. Your voice should be clear and instructions easily 
understood by all the students. Testing is difficult for all of us. 
Do not make it worse by using it as punishment. 


Evaluation of Your Test 

The reader has already been exposed in earlier discussions to what con- 
stitutes a good measuring device. In this section we will of necessity repeat 



teacher-made tests 

T'2<!fln7Tl •“'her-ma* ,«,s, Ebd (mS 

fte foul ing TuS:':. of o good test by noting 

1. litlevamt. Is your test measuring your educational objectives 
rmd actual instruction? This is the validity aspect of the classroom 


2. Balance. IIoiv dose does jw test come to the "ideal”? That 
ts, do your items reflect your stated objectives.* 

3. EJpaeney. How much time does your test take to administer? 
Is it taking too much? A compromise between available time for 
testing and scoring and other needs must fae made. 

4. Objectivity. There should be agreement by experts on the "right" 
or "best" .an.sHer to a question. Objectivity does not refer to the type 
test (for example, multiple-choice or completion items) but is directly 
concerned \nth the scoring of ihe test. 

5. Specificity. Arc you testing the subject matter presented in the 
classroom ? That is, does the test discriminate between those students 
who have learned the subject matter and those ivho have not on a 
better than chance basis? 


6. DiJfieuUy. The difficulty of testitemsshouldbcatthelevel of the 
group being tested. Generally, we can consider a test appropriate if 
each item in the examination is passed by half of the students. 

7. DUcrimination. A test item discriminates if more good students 
answer it correctly than do poor students. 

8. ReliabiUty. Is the test measuring whatever it does measure 
consistently? 

9. Fairness. Docs each student have an equal chance to demonstrate 
his knowledge? The test should be constructed so that each student 
has an equal chance to “show his stuff." 

10. Speediness. Arc scores influenced by speed of response, and if 
so to what degree? In testing achievement, speed should play a very 
minor role in determining a student’s score. There should be suffipent 
time for almost all students to finish the test. 


Types of Teacher-Made Tests 

There are basically tw» types of teacber-made .es«. They are ^ J sad 
objective. There are, however, many forms of the objreti.e ')^P' 
essay is generally confined to a "short answer or 
The following list of different types of objeelive teacher-made tests ant cited 

• SliEh. modifications in Knns of deseripdoo., bo. no. <,vsIi.is.-for «.mpi.. 
have been made. 



382 


teacher-made tests and grades 

,0 show the scope and breadth of available formats. Some of these will be 
discussed in the following chapters. 

OBJECTIVE TESTS2 

1. Simple recall. 

a. Basic. 

b. Problem type. 

c. Maps, charts, and so on. 

2. Completion. 

a. Basic. 

b. Matching. 

c. Analogies. 

d. Maps, charts, and so on. 

3. Alternative Response. 

a. Basic. 

b. Two-clause. 

c. Three alternatives. 

d. Converse. 

e. \Vith correction. 

f. Wth qualifications. 

g. With diagrams. 

h. With analogies. 

4. Multiple Choice. 

a. Basic. 

b. Recall. 

c. Common principle. 

d. Results. 

e. Causes. 

f. Charts, maps, and so on. 

5. Matching. 

a. Basic. 

b. Three columns. 

c. Master list. 

d. Analogies. 

6. Rearrangement. 

a. Chronological. 

b. Order of importance. 

c. Order of difficulty. 

d. Length, weight, logic, and so on. 

In the chapters that follow we will discuss many of the preceding types of 
teacher-made tests. The reader is already familiar with some from the 
chapter on achie\’ement tests. 

* Adapted from A. J, Lien Measurtmtnt and Evaluatton aj Ltaming t .A Handbook 
Jot Ttaehers. Dubuque, Iowa: Wm. C. Brown Company Publishers, 1967. 



383 


teacher-made tests 

Essay Versus Objective Tests 

There are no definitive rules for whether the essay or the objective test is 

°a *Lte"rd"hTSe’time » -0- 

ofXof the major characteristics of the two types of tests. 


Objective 


Abilities 

Measured 


Requires the student to ex- 
press himself in his own word^ 
using information from his 
own background and know- 

'"So tap high levels of 
reasooiog such as ^otred to 
iofereoce,orgaoiiat.onof ideas, 
compatisoo and contrast. 

Does oot PS 

Actual infotiiialion ellicieolly- 


Requires the student to se- 
lect correct answers from given 
options, or to supply an answer 
limited to one word or phrase. 

Can olto tap high '"'’j ?' 
reasoning such as jequtr'd ■" 

inference, organisalionof ideas, 

comparison and . 

Measures knowledge of facts 
efficiently. 


which he is unsure. 

jicentive ®"““”™„to”th''t own 

l:?elandTp.essthen.vfiee- 

tivcly. 

Ease of 

=p-=”” r“e”eatbj=2:^^.r^';; 

-“:;t'eoot;» set. inf- 


covers a broad field of 
knowledge in one 
obiective questions may be 

Xedq'-hly.onetestntay 
contain many questions. A 

S coverage help, pmv.de 

reliable measurement. 

„p a broad backgmund of 
knowledge and abilities. 

• f«r a test. Wording must 

:Sy most likely miaco- 


teacher-made tests and grades 


384 


Table 26~Cont{nued 

Scoring Usually verj' timc-consum- 
ing to score. 

Permits teachers to comment 
directly on the reasoning pro- 
cesses of individual pupiU- 
However, an answer may be 
scored differently by different 
teachers or by the same teacher 
at different times. 


Can be scored quickly. 

Answer generally scored only 
right or wrong, but scoring >s 
very accurate and consistent. 


From AUk!., ,l.e Cta.room Te„: A 
Ser\-ice Evaluation and Advisory Service Senes No. 4. © IP,, 

1959 by Educational Testing Service. Second Edition 1961. Repnnted y P 
of Educational Testing Service. 


The student of measurement should remember that neither the ess^ n 
the objective test alone is completely satisfactory in measuring a^demi 
progress. Each has its own advantages and disadvantages and its yo 
responsibility to use the right test in the right place for the right purpose. 


Teachers, Tests, and Reality 

An attempt has been made in this chapter to give some guidelines for 
constructing tests. In the following chapters detailed examples and suggK- 
tions for developing essay and objective tests will be presented. It should c 
noted, however, that the time available will be the common denominator 
in our discussions. . . 

Remember also that no single teacher-made test should be the only basis o 
important educational decisions. Day-to-day classroom performance, scores 
on standardized tests, and academic achievement measured by a senes 
teacher-made tests over a period of time should be considered in the whole 
picture of pupil evaluation, As Nunnally (1964) states, “To reach important 
conclusions about students on the basis of only one teacher-made test wou 
be as unwise as it would be for the prospector to abandon his claim because 
the first shovelfull was not brimming with gold” (p. 106). 


References 

Aikin, W.M. The story of the eight-year study, vHth conclusions and recommendations. 
New York: Harper & Row, 1942. 

Bloom, B. S. (Ed.) Taxonomy of educational objectives. Handbook 1 : The cogniti^'f 
domain. New York: David McKay, 1956. 

Ebel, R. L. Pleasuring educational achievement. Englewood Cliffs, N-J.J Prentice 

Hall, 1965. 



TEACHER-MADE TESTS 


385 


KnthNs'ohl, D. R., Bloom, B. S., and IMasia, B.B. Taxonomy of educational objectives. 
Handbook 2 : The affective domain. New York: David McKay, 1964. 

Lien, A, J, Measurement and evalualka of learning: A handbook for teachers. 
Dubuque, Iowa; Wm. C. Brown, !%7. 

Noll, V. H. Introduction to educational measurement. (2nd ed.) Boston; Houghton 
Mifflin. 1965. 

Nunnally, J. C. Educational measurement and evaluation. New York; McGraw-Hill, 

1964. 

United States Department of Interior, Bureau of Education. Cardinal principles of 
education. Bulletin No. 38, 1918. 



CHAPTER 



The Essay Test 


In 1834, Horace Mann substituted a uniform written examination for the 
usual oral testing of students in the Boston public schools (Chauncey 
Dobbin, 1966). Before IMann’s “step for\\'ard“ the examination of students 
consisted of interrogation by the teacher or a board of school officials. Since 
Mann’s radical change the essay test has undergone a great deal of investiga- 
tion and critical comment. Let us very briefly review some of the studiw 
and comments as they relate especially to reliabilitj’. We shall keep them m 
mind in our later discussion of developing and scoring essay tests. Above all 
they should ser\-e as cautionary lights in our use and application of the essay 
test in the total evaluation of our students. 


Reliability 

In 1912 and 1913, Starch and Elliott (1912, 1913a, 1913b) produced the 
first studies on the reliability of grading essay tests. These three classical 
investigations covered the subject matter of high school English, history, 
and mathematics. 

In the area of English the investigators selected two English themes of 
386 


the essay test 


387 


sfonoC r '‘™ ™ ■”' 

64and99,^vlthamedunof88^(SurchandE]lion^9I2) 

kre?/? Elliott (I9I3a). i„ ,hdr study of history tests, found an oven 
larger discrepancy of pades. The range was over seventy points on the same 
too per cent basis In their mathematics investigation they sent a plane 
geometry paper to 138 geometry teachers for grading, because they thought 
that mathernatics should be more subject to grading consistency. Their 
results revealed a grade range belnecn 28 and 95 (Starch and Ellloit 1913b), 
In a later and more sophisticated investigation Falls (1928) requested 100 
linghsh teachers to grade a paper already e\’aluated by a committee as 
Mcellent. The 100 teachers did not know that a committee had previously 
evaluated the paper or that the writer, a high school senior, was a reporter for 
a large city daily paper. Grades on this paper were between 60 and 98. In 
addition, comments placed the writer’s grade level in school from fifth grade 
to a junior in college. Study after study has confirmed these findings. 

As the years have gone by, research designs, methods, and procedures of 
analysis have become more sophisticated but the results hare genetally been 
the same. For e.\ample, in a more contemporary study Myers, McConrille, 
and Coffman (1966) used 145 readers and ^,000 essays. They found average 
single reader reliabilities of 0.41. The average reliability rose to 0.73 when the 
number of readers was increased to four. This dramatic increase was achieved 
under controlled conditions with trained readers who read the essays as a 
whole and then utilized a four*point scale. This research revealed that if a 
teacher is going to give essay tests, then sevenii graders should be used. 

James (1927) attempted to study the reltabilitv' of grades assigned by the 
same instructor. He selected four compositions that were judged to be of the 
same quality and asked forty-three English teachers to judge them. Two 
were in good handwTiting and two in poor. Two months later the same themes 
were presented to the same teachers but with the handwriting reversed— 
that is, the two that had been in poor handwriting were reproduced in good 
penmanship and vice versa. The results texealed a one letter grade difference 
in favor of the good handwriting. 

Most of the research on essay rests has inwh-cd the reliability of graders of 
essay tests A paucity of data exists concerning the actual reliability of the 
instmment itself. Curcton (1938) in a discussion of this problem states that 
in order to obtain an estimate of essay test reliability, two cquiraicnt forms 
ore nocKOOij- at the beginuilig onj ibot ■I'''; be 
individuals at an optimal time interval. Each reader unuld have to read 
all of the Insuets to a specific question on both fom,,. If this “““ 

bo created then a correlation between grades obtained on one form with 
M created tnc considered as an estimate of rclabilily. 

those “ *^t ,l?method of increasing the reliability of an essay 

testes i incr^“ quesn’ons and restrict the ealensivencss of 



3S8 


teacher-made tests and grades 

the answers. The more specific and more narrowly 
the less likelv they are to be ambiguous to the examinee. This should res 
« ul~proh=nsio„ and p«fo.mance of the -igned .ask, and 
the reliabdity of the instrument and sconng should be incr^sed (PP' [ 
The teaeher who wants to administer essay tests m ^P^" “f ^ 
findings on reliability should heed Payne s advme but at ‘h' ’ 

very auticus in interpreting the scores. Some of the advtce on the comtrac 
.ion of essay tests which will be discussed later in ‘h- to 
but caution should still be the watchword. It should also be noted that t 

research and implications for the reliability of the essay test that have b«n 

made are in reference to evaluation. The educational process, ° / 

involves more than evaluation. Teachers may feel that the essay test, thougn 
not necessarily reliable in the measurement sense, maybe an excellent instru 
tional device. That is, it may help the student learn to organize his thougni 
and express them in writing- We will have more to say on the advantages 
the essay test later in this chapter. 


Validity 

Achievement tests are validated on the basis of whether or not they W'cr 
the subject matter taught and the goals of the unit of study. Validity, as is the 
case with reliability, tends to be higher as the number of questions arc 
increased. Essay tests necessarily contain a relatively small number of queS' 
tions. Thus the sampling of the objectives outlined in the unit or cou«c 
specifications wll be small and therefore will tend to lower the validity of t e 
test. It is common after taking an essay test for students to feel that some o 
their study time was “wasted” because so much of the course was not 
measured in the four, five, or six essay questions presented to them. 

Another problem in the validity of essay tests is the ev’aluation of more th^n 
one area of learning. For example, some instructors reduce the score lor 
poor spelling or grammar on a historj' paper, thereby lowering the histor) 
grade- The resultant grade does not ^e^'eal what the student knows 
history alone but also what kind of speller and/or grammarian he is. 1“ 
type of “history” evaluation generally does not seive the cause of histor)' or 
that of spelling and grammar very well. Our old definition of validity certainly 
applies here — that is, whether or not the test measures what it is supposed to 
measure. If you are testing history, then test history, not English. If, on the 
other hand, you feel that educational objectives are best seiwed by evaluating 
English usage and content, then give two grades — one for English and one 
for histor)-, but be sure and follow through on the report card. If histor)' 
alone is on the card, give only the history grade or add the English evaluation 
so that parents and other school personnel in future years will have a clear 
understanding of how- the pupil er\-aiuated and in what areas. 



the essay test 

389 

to IxpoMd « o' *t»*n>3 

fall tv bZT°'' r‘''' "" S-tl' be “orof "ws" 

raato’Tr=,e H bno'tWge of the subject 

.hoovefafaS;^^^^ 


Advantages and Disadvantages of the Essay Test 

The essay test, despite low rdiability and v-alidity, is held in high esteem 
by many teachers from elementary through graduate school. Let us look at 
some of the common argutnents given in defense of essay tests and their 
merit. 

Thorndike and Hagen (1961) state that the distinctive advantage of the 
essay test lies in the requirement of asking the student “to produce, rather 
than merely to recognize the answer.” They go on to state: 

Thus, it minimizes the possibility of getting the answer by blind 
guessing or by using little cues to outguess the test maker. It can, if 
the questions are well prepared, bring out the examinee's ability to 
select important facts or ideas, relate them to one another, and organize 
1 them into a coherent whole. Emphasizing this iniegrative type of 
I product, it elicits, so it is claimed, better study habits in those v\ho 
i are preparing for it (p. 42]. 


Does the essay test in fact present a chance for the student to show his 
organizational skills? The answer is dependent on the quality of the test. 
If the test is well constructed it probably can. The problem, however, is that 
often the student is not presented with an examination geared to this goal. 
The student may find himself at a loss in attempting to answer a long dis- 
cussion question— for example, “Discuss the antecedents of pop art.” This 
type of essay problem does not enhance organizational ability nor does it 
call for the student's sfci7f in selecting important facts or ideas. Tlte student 
faced with our example problem would have to ask himself first of all, " WJiat 
does the teacher want ? Does she tvant me to start with the Egyptians or prim- 
itive period of art or does she ivant me to trace the development of pop art m 
the twemiBh century?” On the other hantl. if the problem staled m the 
folio, ring ntaoncr, "Discuss the infloence of cubism on pop att, the student 
tvould have an idea of vvhctc to start dwtep h.s Ime of teasontngj 
he would knot,- what the teacher wan^. Similarly, all the students would 
be dealing with the same problem and therefore gito the teacher a better basis 
for comparison. Used in this maunct the «say test can be useful m ova uatmg 
a student’s ability toorganiie, select, and integtate.mpottant ideas and trends. 
Another hallmark of essay tesrt is that they requite students to answer 



teacher-made tests and grades 
390 

questions in theit own words nnd handwriting. We 

some of the problems in this area m our review of reliability and valioi y 
You til remember the James (1927) study 

manship usually yielded one grade higher than ’ J" ' do 

the student who can use the English lan^age effectively " 

better on discussion questions. The student who may not 

either in terms of penmanship or English usage may, in f • ^ „ 

more about zoology or history and yet receive a lower grade. It IS very difficu 

not to be influenced by such external factors as neatness, handwriting, ^nte 
structure, spelling, and vocabulary. This is not to state that these factors are 
not important. On the contrary, they are legitimate and necessary 
objectives. If people cannot communicate their knowledge and ideas, no\ 
do we, or society in general, know that they possKS either? The problem, 
then, is not the worth of grammar and penmanship but how to evaluate a 
student’s knowledge and understanding of biology or history without ev ua 
ing other skills and knowledge. . , 

A major disadvantage of the essay test, as has already been mentioned, 
lies in the scoring. How do you evaluate the answer given? What are the 
degrees of correctness? Downie (1967) in re%’iewing the merits and limitations 
of essay tests states, 


If we cannot truly assert that essay tests foster skill in ^^Tit^ng and 
answer adult needs better than objective tests, why use the essaj 
test? Probably for most classroom situations they are unnecessary 
except to evaluate how well a student can write a theme or essay. If " ® 
want to assess style, quality, and other aspects of writing, it is obvious 
that the essay test item has to be used. Even here though studies 
have shown that objective tests of wTiting ability can predict achieve- 
ment in writing, the latter measured by both teachers’ estimates and 
grades received, better than do the essay tests [p. 202], 


This writer, though agreeing with the studies of the essay test’s poor 
reliability, cannot go along with the pessimistic ^dew of Professor Downie. 
Properly constructed essay tests can serve as aids in developing and fostering 
skills in writing. People learn by doing, not by showing potential to do. One 
cannot learn to play the piano because one’s musical ability is potentially great. 
One learns to play the piano by playing the piano. One learns to organize and 
express oneself in written discourse by actual writing experience. In a strict 
measurement sense the essay test is not the best de\-ice for evaluation. In an 
educational sense, which includes measurement, however, the essay test 
may be used as a useful instructional tool in facilitating learning. 


Construction of Essay Tests 

Let us now proceed to the ingredients that make essay tests more useful 
to classroom c\'aluation. First of all, we can improv’e the essay examination 



391 


the essay test 

bv limltini! it to the objectives that it measures best, for example, skills m 
the selectiL and organiaation of important facts and ideas Second tve pant 
the deficits of the efsay test, such as limited sampling, and go on from there 

established procedures student’s achievement? If your 

"Does this test bring out f cogent manner. Avoid cluttering 

2. Phrase your questions in a pr^ d | Remember, use 

" 

^eSin your essay — S 

cmtmt. compare, prismt AiAut, echo. Ihl, and dirriiK (unless 

begin essay questions with auch ^ y specifically and thereby 

you detail the concepts you ^“““rds are either too nebulous or 

Mnlain what you mean by dtsaus. 

are prone to elicit mere cecitafon of fao ““f ' L" 

4!^Define and narrow the “^1'” .^escribe the various facets “f co>oni 

Another example of improving 
following: 

1 s lluitcd Nations has been a success 

Explain why you tbinkthe United N 

(rates a 11'"“!. "rm^,.as handled unsuc^sfuUy, P 

Nations. Your css y 


Poor 


Better 



teacher-made tests and grades 
392 . „ • 

(2 or 3 pages in longhand). [Educational Testing ervice, 

n 1951, p. 22.] 

5. Never write essay questions with the t^rds 
opinion, or terto all you know about. Thorndike and Hagen (1961) state, 

when a teacher asks “Why do you think that <tie Articte of 
Confederation provided a poor basis for the formation of our central 
government?,” he is not really interested in the ® “P” 

He actually wants to determine whether the student knows the 
mental weaknesses of the Artieles of Confederatjon as stated by th 
teacher or text. Therefore the question would be better if writte . 
“Why did the Articles of Confederation prove to be unworkable as 
y framework for our national government?” [p. 54]. 

If the purpose of the essay test is to obtain student attitudes (which are 
impossible to grade) or to measure the ability of a student to present a op 
defense of his position then you or in your opinion arc permissible, 
teacher should, however, be careful in the case of measuring the student 
ability to present a sound and rational position to grade on that premise an 
not on the position per se. 

6. Construct your questions so that they may be answered at all oegrera 
of competency. Every student should be able to respond to the question. 
However, the students may reveal varying levels of knowledge and 
standing in their responses. Do not present questions that only the top 1 
per cent can answer or questions that do not differentiate between the bottom 
10 per cent and the top 10 per cent. The difficulty of the question shou 
lie in the response, not in the vagueness or remoteness of the question. 

7, Allow enough time for students to respond to the questions. Remember 
you are attempting to measure understanding and ability to apply facts not 
how fast students are able to write or organize their thoughts. If you gi'C 
more than one essay question you might suggest a time allowance for eac 
question. _ 

8. Require all students to respond to the same questions. Do not^ otter 
alternative choices among the essay questions presented. For example, if 
present six questions and ask the students to choose four to write on, yn^ 
will not have a common basis upon which to evaluate different individua s 
within the classroom. Rresenting the same questions to all the students 
a common reference point in comparing students and increases the validit) 
of the test. 


Grading Essay Tests 

Grading essay tests is extremely important and is as important, if not more 
so, tlian the construaion of good questions. Of course, both arc needed if 



394 


TEACHER-MADE TESTS AND GRADES 

5 Average two or more graders’ ratings. It two or more teachers am 
using the sLe test and have agreed upon thycorrng P™“dute an 
all the factors are held constant, except for different Raders, then a 
average of the scores for each test produces a more reliable rating 
This,\t course, takes time and tvould probably be done only m very 
important situations such as promotion or graduation. 


A Case History 

The following case history of an essay test and one teaser’s approach .5 
nresented to illustrate some of the points we have covered m actual ptaet m ■ 
K reading it evaluate what Mr. Frank did and did not do. How would 
you rate his approach based on our previous discussions? 

An Essay Test to Measure a Special Ability in Eighth-Grade 
American History' 


Mr. Frank’s eighth-grade American history class had been studying 
the fighting that took place between the Indians and the settlers in the 
western states. The class had just completed several discussions on 
the rights of each side. , . 

The major purpose of having these discussions was to improve tne 
pupils’ ability to find and express convincingly facts and arguments m 
support of their opinions. . 

Mr. Frank decided that he would like to give a test to measure 
class’s skill in this ability. At first, he considered giving an 
test. He thought he might list a number of arguments presented 1^ 
both the Indians and the settlers and then ask the class to identify 
those which were backed up by facts. But then he decided against 
using this kind of test. An objective test would require the student 
to select sound arguments: it would not call upon him to de\'elop 
and present them conrincingly as he would do in actual discussion. 
Accordingly, Mr. Frank decided that an essay test would satisfy his 


I purposes best. 

I Since the subject matter of the test was limited, it was unnecessary 

for Mr. Frank to prepare a wTittcn plan for the test. In a sense, the 
test questions themselves constituted the test plan. Here is the test he 

i prepared. It had three questions; 

1 . Pretend that you arc a settler and give three genera! reasons why 
you think your side is right in the war with the Indians. For each oi 
the reasons, describe an actual happening to support your argument. 
2. Pretend that you arc an Indian and give three reasons why you 


'From Making the Ciatrroom Test: A Guide for Teachers, Educational Tesung 
Service Evaluation and Advisory Service Senes No. 4. <0 First Edition Copj^gn* 
1959, by ^ucational Testing Service. Second Edition 1961 . Reprint^ by permission 
of Educational Testing Service. 



THE ESSAY TEST 


395 

think your side is right in the war with the settlers. For each of the 
re^ons, describe m actual happening to support your argument. 

J. Look at the six reasons given by both sides and decide which one 
would be most dangerous if o'eryone accepted this kind of reasoning. 
Give two examples of how people might do bad things if they accepted 
this kind of reasoning. 

Before scoring the papers, Mr. Frank analyzed the points which he 
thought would appear in an ideal response and decided how much he 
would count for each point. He decided not to take off credit for 
mistakes in spelling and English usage. But he planned to show the 
English teacher any paper which was especially poorly written so that 
the English teacher might give help in composition writing to those 
pupils who needed it. 

After Mr. Frank corrected the papers, he found that most of the 
pupils had proceeded well on Questions 1 and 2 requiring reasons and 
examples. Hoivever, many of them had floundered on Question 3, 
which required them to point out the dangerous implications of one 
argument. Because of their difficulty with Question 3, Mr. Frank 
decided to organize a series of classroom debates, so that the students 
would get practice in extending, attacking, and defending an argument. 

On an essay test of this sort, scores are not highly reliable. On a 
second reading, after a little time lapse, ^!r. Frank would And it difficult 
to give every paper the same mark as on the first reading. Furthermore, 
several teachers grading the same papers would probably not agree very 
closely with one another. Therefore, Mr. Frank avoided giving an 
exact numerical score for each paper but instead assigned three general 
grades; good, average, and poor. However, he wrote many comments 
on the papers so that the pupils ivould have a better idea of the 
strengths and weaknesses of their ai^umcnfs. ?Ie also read several 
papers to the class for discussion purposes, making full use of the test 
as an instructional device. 


References 


Chauncey, H. and Dobbin, J. E. Testing has a history. In C. I. Chase and H. G. 
Ludlow (Eds.), Readings in educational and psychological measufmenl. Boston; 
Houghton Mifflin, 1966. Pp. 3-18- ..... m .• / mi 

Cureton. E. E. Definition and estimation of test reliability. Educattonal and Ptycho’ 
logical Measurement, 195S, 18, 7t5~38. ^ ^ 

Downte, N. M. Fundamentals of measurement : T tchmjues and practices. (2nd cd.) 

NewYorkiOtfordUniversiiyPrcss. 1967. o i i v i<i7a ft 

Falls, J. D. Research in secondary education. Aentuciy School foumal. 1928. 6, 


James, H. W. The effect of handwriting upon grading. The English Journal, J927, 
16, 180-85. 


teacher-made tests and grades 
396 , .. 

Mve.3 A E McConvMe. C..and Coffman. W E Simplexstnictu^ 

'of e;say ,4.s. Educational and PcychoHicol Walto 

Pa>-ne D A. The specification and measurement of learm g 

S.i-D!td ns; K ^5 « of S-Ofos “s'- 

Smch,’ a! al^Eliimt. E. C. Reliability of grading nork in mathematics. School 

ThS^ke; aa.’a“^HagS E. A/e_, nnd evaluation in ps,-cholos,- and 
education. (2nd cd.) Nw York: Wilej, 1961. 


Additional Readings 

In addition to the readings belotv, the preceding references are also excellent 
sources of information on test construction. 

Adams, G. S. Meantremenl and evaluation tn education, psychology , and guidance 
New York: Holt, 1964. See Chapter 10. «m^ctivcs. 

Cox, R, C. Item selection techniques and the cxaluation of instmctjonal o j , 
In D. A. Payne and R. F. lIcMorris (Eds,), Educational and „ 

measurement: Contributions to theory and Waltham, Mass- 

Publishing Co., 1967. Pp. 157-61. .Mtional 

Educational Testing Sereice. ETS buildi a test. Pnneeton, N.J.: Eduati 
Testing Service, 1965. This is a twenty-four page booklet on how a stanaaro 

test is constructed. Much of it can be adapted for classroom use. ^ t n A 

Eneelhart, M. D. Suggestions for writing achievement test exercises. In • • 

Payne and R. F. McMorris (Eds.), Educational and psychological 
Contributions to theory and practice. Waltham, Mass.: Blaisdell, 1967. Pp- ^ 
Foreman, E. Improving the reliability of a teacher-made test: A case study. 

D. A. Payne and R. F. McMorris (Eds.), Educational and psychological 
ment: Contributions to theory and practice. Waltham, Mass.: Blaisdell, 

Pp. 152-57. . 

Lien, A. J. Measurement and evaluation of learning: A handbook for tear 
Dubuque, Iowa: Wm, C. Brown Co., 1967. See Chapter 5. _ 

Lindeman, R- H. Educational measurement. Glenview, 111.; Scott, Foresman, 

See Chapter 4. . e 

Mosier, C. I., Myers, M. C., and Price, H. G. Suggestions for the construction 
multiple-choice test items- In C. I. Chase and H. G. Ludlow (Eds.), 
educational and psychological measurement. Boston: Houghton Mifflin, > 
Pp. 272-81. 

Noll, V. H. Introduction to educational measurement. (2nd ed.) Boston: Houghton 
Mifflin, 1965. See Chapters 5, 6, and 7. 

Remmers, H. H., Gage, N. L., and Rummel, J. F. A practical introduction o 
measurement and evaluation. (Znd ed.) New York; Harper & Row, 196o. 
Chapter 8, , 

Stanlc)", J. C. Measurement in today’s schools. (4th ed.) Englewood Cliffs, N.J** 
Prentice-Hall, 1964. See Chaplets 6 and 7. 



CHAPTER 



■ ,845 when he inlroduced a uniform 

requiting all student beginning of the seco . ^ „ bas been 

later standardbed te^st • ^ testing- teacher- 

reviewed earlier an parallel t , —Qn^cBomeofthc 

made objective was developed the merits 

movement. Th“^ ,^5,. We have „i,ics of objective tests 

disadvantages of <)'=“ L „£ instruments. Ihe „tm„,ization of 

and limitations of o ^ factual tecall, e p conceptuaI|za' 

state that they much guessing, do not d 

obscure details, one-umg 'o“ « ,he op^'^ ' J ,he previous 

tion, and do , „i.h the precedmg^”'™^^^^^^ ,0 what con- 

Although SVC have d “"’“'''i^'hlwever our main attention 

chapters, in this chaptc importantly, j utilising the objective 

stitutes an objective '^methods of improvnS 

wall be focused on prae^ ' ,he results. 3„ 

test for classroom use 


398 


teacher-made tests and grades 


Characteristics 

The objective test is so called because the one, 

when the test item is written. That ts the completely 

the major — 

t^hesinseoringthe test. Objective testsattempttoovercorneth.sd^^^ 

It should be noted that the word o^ecte is in J . , before 

scoring. These scoring rules for ohjecttve tests are 
testing begins. The actual content and coverage of objective t«t , h 
may be as subjective as the essay test. It is possible that the teacher tn y 
wrong in what she designates as the correct answer. The ™P‘>^ P 
remember is that the teacher must be sure of what she const 
correct answer. In an essay test teachers sometimes have a 
difficulty setting up their own standards for the correct iided - 

test item, howe%'er, must have only one correct answer, and 
when the item is written, not when it is graded. ^^,r«;nee is 

The objective test is a structured examination. That is, each exam 
presented with exactly the same problem. The «say question, 
how good it is phrased, vaW have different meanings to different • 

The objective test, on the other hand, being completely structured, . 
answered in a prescribed manner. The student is not called upon to org 
his response as he is in the essay format. nrrect 

The objective test requires the student to recognize, not recall, the c 
answer. This is because most objective tests present given alternatives (" 
the exception of the completion item), one of which is the correct respon 
It should be noted, howe\'cr, that objective items can be constructe 
appraise recalling and the use of previously learned information, althoug 
most cases the objective test does not tap this source of knowledge. ^ 

In our previous discussions of reliability, we have noted that an inCT 
of items tends to increase the reliability of the test. The objective te^ cn » 
itself to this task more readily than the essay because each item of the objec i^^ 
test is short and requires less response lime. Thus the greater num e 
items in the objective test can sample more topics. • h d at 

Scoring the objective lest is routine, because a scoring key is establisne ^ 
the time the test is constructed. l*hc score, therefore, will be the same 
matter who scores it. The student who is the teacher’s pet or the ^ 

the handwriting expert or the scribbler will all obtain the same scores if t e) 
chor^sc the same responses. 


General Rules for Item Writing 

'I*hc following rules should be obscrx'cd when writing objective tests: 

1. Olncrvc the rules of good English usage. Gear your language to 
the whole group of students. Attempt to communicate to the students 


399 


the objective IIEST 

in readable Imgeage (hat b sited or complex. Be sore that even 
the duUer student ran understand the items. 

,m„n answers that would be aerced 

answer is d T""' T O'' 

answer is dependent upon obscure &ey uords. 

3. Avoid items that answer other hems in the test. 

4. Use your test items to tap important, not trivial, areas of know- 
leuge. 

5. Identify any authorities cited in your item, 


Types of Objective Items 

There are many different types of objective items. Our attention will be 
focused on the most prominent. 

True-Fatse Items 

Do not use true-false items if at all possible. The iruc-falsc item has been 
very popular with tcachcn, probably because it is easy to construct and 
requires little time. Do not allow the ease of construction to lure jou into 
the true-false trap. Good true-false items are not easy to \mte and even the 
good ones have many limitations. The followingstatcmcnts arc representative 
of the major drasvbacks of the true^faJsc item; 

1. The true-false item tends to be greatly influenced by guessing. 
If, for e.\amptc. a true-false test contained 100 items, it Is very' probable 
that most students could obtain a score of around 50 by guessing. 
This restricts the range and meaning of the scores. What is the relation- 
ship of John's score to guessing? Can one really say that MarjvU a 
better student than Jane, or is she a better guesser? The longer you 
make the test the less chance of measurement error resulting from 
guessing; however, guessing still has an undue influence on .■scores. 

2. It is almost impossible to make statements either al«olufcIy true 
or false. For e.xample, read the following statements and sec if either 
is absolutely true or false. 

T F Adolph Hitler was responsible for \\'orld \\*ar II. 

T K People only exist in relationship to other people. 

Although many historians and politial scientist.s would place the 
major responsibility for World War II on Hitler and Germany, they 
would also trace the conflict to World War I and the resufting economic 
sanctions. The answer, of course, would ncctl to be qualificil. 'ITie 
second .statcincni might fit neatly into some philosopher's bag or l>e 
xs-armly embraced by the sociologist. Other philosophers and p^jeho- 
logistx'might disagree cntirelv rnth it, or agree only if certain qualifica- 
tions and nuances are added. Who is right? 'Hicrc are few. if any. 



400 


teacher-made tests and grades 
ideas in the world of today that are so absolutely eertain that simple 
'Tt™ "t“st“"ptf t‘iS^ habits. Studenm - 
clever and will more^m^^ 

fives— five true and then five false? 


Granted that the preceding examples are obvious and 
disprove their validity? Could the perfect item that you come up 
repeated in different areas over 100 tim^ or more? 

If you must write true-false items, bear in mind the following y 

lights”: 


1. Suy away from broad generalizations which ob\^ously giv 
answers away. Statements that indude such terms as never, 
altcays, all, and every tend to do this. 

2. Avoid items that are partly true and partly false. ,._ents 

3. Do not wTite unusually long and complex statements, b ® 

of this kind are generally true and many students arc aware o i • 


Completion Items 

Completion items require the student to fill in a blank that completes the 
sentence or answers a specific question. For example: 

1. The Constitution requires that a member of the United States 

Senate be at least years old. , ^ 

2. How old does a person have to be in order to be eligible tor 

United States Senate? 


The first statement requires the student to complete the sentence, ^ 

the second asks the student to supply the answer. This last form is also ca 
by some a short-answer item. For the purposes of our discussion both w 
included under completion items. . , 

The completion item is related to the essay item and serves as a . ‘ ? 
between the objective and essay test. On the one hand, it is objective, 
sense that a prearranged answer can be chosen before testing; on the ot 
hand, it is related to the essay test because the student must produce^ 
correct answer rather than recognize it. The completion item is especJ 
useful for appraising your student's knowledge of facts, such as names 
dates. Its major limitation is its major asset; that is, its use to appraise fart 
knowledge. It does not measure the student’s ability to apply or use 
of information. In addition it is difficult to phrase items with clarity and > 



the objective test 

401 


TilTetl f “ ‘0 

simple task of asking for pure facts. 

The following are some suggestions for writing completion items: 


1. Write completion items in your own words and be sure that you 

limit correct responses to your actual achievement goals. 

2. Refrain from writing items that can be completed by general 
intellectual ability rather than knowledge of the subject. 

3. Do not give away the answer by s-arying the blanks according to 
the number of letters in the word. Blanks should always be of standard 
length. 

4. Do not mutilate a passage with too many blanks. 

Poor: In ri9l7) Puerto Ricans were made {cu<wng) (United 
§“!??) and given the right to an upper house with 
the (President) reserving the to (vtto bills) and appoint 
the governor and certain other officials. 

Better: Puerto Ricans were made citizens of the United States and 
given the right to elect an upper house with the President 
reserving the right to vicfo bills and appoint the and 

certain other officials in the year of 

5. The important ivord or words to be fitted in should be at or near 
the end of the blank. This allows the student the opportunity to read 
the problem before the blank is seen. 


Poor ; was the first United States Astronaut. 

Better: The name of the first United States Astronaut is 


6. Construct your specific questions in a direct manner rather than 
using incomplete sentences. 

Poor: Columbus discovered America in 

Better: In what year did Columbus disco\'cr America? 

7. State the words in which the answer is to be gi%'en. 

Poor: Where docs the Congress of the United States hold its 

city docs the United States Congress hold its 

sessions? 


Matching Items 

The metchins item’s mejor aj.-amage & 
material into a limited amount of space. An caample of this tjpe tten. 

follows: 



402 


teacher-made tests and grades 


1. James Fenimore Cooper 

2. Ralph Waldo Emerson 

3. Nathaniel Ha%\thorne 

4. Edgar Allan Poe 


a. Compensation 

b. The Deerslayer 

c. Ethan Brand 

d. Ichabod 

e. The Legend of Sleepy 
Hollow 

f. A Psalm of Life 

g. The Raven 

In the blank spaces students are requested to write the letter of each of the 
titles on the right that corresponds to the authors on the left. , j,-,'. 

The matching Item is not particularly well suited to measuring the s 
ability to conceptualize. It is useful for appraising specific aspects of a suDjeci 
field, such as dates, leading personalities, definition of terms, meaning 
words, and association of authors with titles of books. 

The following are some suggestions for writing matching items: 

1. All matching items should be presented intact. All items for each 
set should be on a single page. Do not split the items from one page - 
to another. This may be confusing and time-consuming for the student. 

2. The list of answer choices should contain at least two or ^ 
more items than the number of problems, which will reduce the c e 
of guessing. If both lists contained an equal number of items, e 
student could more easily arrive at the correct response through t e 
process of elimination. 

3. Do not mix subject fields. This means for example, that if yo 
have dates in a response list, you should not mix names or other un 
related items with them. 

4. The instructions should clearly stale the basis for matching. 
Students should be told specifically what they are to do. For examp e, 
if one answer is to be used more than once, be sure to explain t is 
in the directions. 

5. Present response items in a logical order if at all possib e 
alphabetically, chronologically, or any other format that assists t e 
student in quickly perceiving all the items. For example, note 
alphabetical list of authors and titles in our previous example. 

Remember that you are attempting to appraise the student’s knowledge 
of the subject field not his intellectual acumen in figuring out obscure t 
directions. If your matching items and directions are unclear, you may 
penalize the student who knows the subject but is not test-wise or ski u 
at figuring out puzzles. 


Multiple-Choice Items 

The multiple-choice format is one of the most popular and effective of all 
the objective tests. It consists of two parts: (1) the stem, which states t e 



THE OBJECTIVE TEST 


403 

problem, and (2) a list of options, one of which is to be selected as the 
answer. The stem may be stated as a question or as an incomplete statement. 
It does not matter much which form is used. Most experienced test writers 
prefer the incomplete statement because it allo^vs the reader a smooth 
transition from reading the stem to selecting the appropriate item. On the 
other hand, inexperienced test writers prefer the question form because it is 
easier to construct and helps the writer to state the problem cogently. 

In constructing an incomplete statement item, be very careful in phrasing 
the options. Each option should follow the stem in a smooth manner. An 
awkwardly phrased option may be a cue that it is not the correct answer. 
Examples of both forms follow; 

Questioti : What is one of the important causes of njental 
retardation? 

A. Poor schools. 

B. Smoking. 

C. Heredity. 

D. Nuclear fallout. 

Iticomplele sialmefit.' One important cause of mental retardation is 
A. poor schools. 

6. smoking. 

C. heredity. 

D. nuclear fallout. 


The multiple-choice item can be used to appraise almost any educational 
objective with the exception, of course, of student organization and ability 
to produce answers. Some of the areas it is effective in measuring arc the 
following: 


1. Information. 

2. Vocabulary, 

3. Isolated facts. 

4. Cause-and-cffect relationships. 

5. Understandings. 

6. Insight and critical analysis. 

7. Solution of problems, 

8. Interpreting data. 

9. Application of principles. 


Many paople believe that the raul(iple.cf.e.;ce item is “ 
requirinE little thought and no mdemtandtng. In order to combat thts 
myth th! Edueationaf Testing Semco <1963) prepared a booklet to sh™ m 
concrete terms the falseness of the charge. Note that tn F.gur 42 a of 
sculpture is presented and the student B teqmred to place it in its proper 



404 


TEACHER.MADE TESTS AND GRADES 



In which of the following centuries was the piece of sculpture shown above most 
probably produced? 

(A) The fifth century b,c. 

(B) The fourteenth century A.D. 

(C) The sixteenth century a.d. 

(D) The eighteenth century A.D. 

(E) The Uventieth century a.d. 

Figtire 42. Example of a multiple-choice item testing depth of knowledge. 
(From Alultiple-Choice Questions: A Close hook. Copyright © 1963, Educational 
Testing Service. All rights reserved. Reprinted by permission of Educational 
Testing Service.) 


time period. Here the student must apply learned data to the situation. It is 
not simply an exercise or recitation of memorized information. 

The multiple-choice item is very versatile and relatively free of some of 
the problems that afflict other types of objective items. The multiple-choice 
item does not require that one option or alternative be completely correct. 
On the contrary, the requirement is only that one option be significantly 



405 


the objective test 

more correct. The difficulty of c multiple-choice item rvill depend on the 

plausible incorrect options ““ •''V" eTe "'l, is, 

product, however, is well worth the item 

or”ie«ivc msritThoped the following suggestions and 
examples will help in this endeavor; 

which are true and others false. 

Poor; In the Midwest 

A there are many mountains. c-rlmr 

B.' tornadoes occur most *' 

c. people are politically more conservative. 

D. Chicago is the largest city. 

Sn“y the problem m '^“^(b)- wSifalsrptSroi ho,; 
which the teacher Intended, or (B), «b,cn 

“';l!^.'The most heavily populated city in the Midwestern region 

of the United States .s 

A. New York. 

B. Chicago. 

C Cleveland. 

D, Los Angeles , ,„b„„s his lack of 

Note that if the student choose, (A or , (C) respons 

HiPisssHa 

l°ranswerisco«cn ^ “'i to ffireeCrt', 

■ examples: 



teacher-made tests and grades 

Poor: Schizophrenia is a tenn used in psychology to characterize 

A. radal discrimination. 

B. a group of children. 

C. a group of psychotic reactions. 

D. a group of Indians. 

The student who has read and listened in class at even a very minimal 
level should be able to guess that (C) is the correct response. He 
therefore rules out all other options. We have appraised only his 
intellectual ability to rule out obviously incorrect alternatives. 

Better: Schizophrenia is a term used in psychology to characterize 

a group of 

A. neurotic reactions. 

B. organic disturbances. 

C. psychotic reactions. 

D. manic-depressive reactions. 

In the better version the student knor\*s that all four options are 
psychological terms dealing with disturbed behavior. The problem is 
which alternative best characterizes schizophrenia. The Jmowledgc- 
able student knows that (A) is incorrect because schizophrenia con- 
stitutes a loss of reality. He also knows that schizophrenia may have 
an element of organldty (ph>'sical causes) and sometimes behavior 
similar to those classified as manic-depressive. He chooses (C) 
because he knows that the other responses, although related, are 
separate diagnostic entities and schizophrenia includes a group of 
psychotic reactions. Thus using all plausible options curtails in- 
telligent guessing. 

4. The length of the options should be consistent and not vzry with 
tbrir correciiuss or irtcorrectoess. }f this is rtot done sttidents vUi 
become aware of the fact that the long options are correct or dee versa. 
Using our previous example, let us modify the options: 

Poor : A. neurotic reactions. 

n. organic disturbances. 

C. psychotic reactions which re\'eal various degrees of ego 
disintegration. 

D. nunic-dcprcssivc reactions. 

Note that in the changed version the correa answer (C) is longer, 
whereas in the original (better version) all four of the options arc 
consistent in length. 

5. The stem should not contain an excessive number of words 
unless jour goal is to evaluate the student's ability to select basic 
facts, or unless your objective ts to plav an intellectual variety of 
hide and go seek. Note the following two examples: 



the objective test 


407 


Pc^: On March 4 1801, Thnmas Jefferson, leader of the vic- 
torious Republican Party, , calked from his capital boardinc 
house and took the oath of ofSce. He looked the part of a 
farmer and, indeed, he was one, for hb biggest interest 
nest to his country, svas his affection for his beautiful home 
and estate in Virginia. He was a philosopher and lover of 
peace and was happiest when he could think about problems 
in art, religion, or science. The most important event of 
his first administration was 

A. the repeal of the Naturalization Act. 

B. the purchase of the Louisiana territory, 

C. the repeal of the excise tax. 

D. bringing the Barbary pirates to terms. 


Belter :'Tht most important e\'ent of Thomas Jefferson’s first 
administration w’as 

A. the repeal of the Naturalization Act. 

B. the purchase of the Louisiana territory. 

C. the repeal of the e.xcise tax. 

D. bringing the Barbary pirates to terms. 


Note that in the poor e.tample the student had to wade through 
irrelevant material before arriving at the problem, whereas in the 
better example the stem immediately confronted the student with his 
task. 

6. Maintain correct En^ish usage throughout the item. 

7. Place the correct option randomly throughout the test. That is, 
do not place most of the options in the (A) position, (B) position, 
and so on. Nunnally (1964) suggests a method that is easy to do and 
will arrange your items in a random order; 


One way to rearrange the alternatives in a random sequence is by 
the use of shuffled cards. If, for example, there arc five alternative 
answers for each item, the letters a, ft, c, d, and e are written re- 
spectively on five cards. These arc then shuffled four or five times and 
dealt out. The letter on the first card determines the position of the 
first alternative as it appeared when the item was constructed, which, 
... is usually the correct alternative. The next card dealt determines the 
position of the first incorrect alternative, and so on until the positions 
of all five alternatives are determined. When such a random procedure 
is used, students cannot accurately detect patterns in the ordering of 
alternatives [pp. 124-125]. 

8 Do not use such options as none <jf these, both a and c above, or 
all of the above. The only time you may use such terms is in a mathe- 
matical problem requiring mechanical computation. For example. 



408 


TEACHER-MADE TESTS AND GRADES 


Poor: Mary has an IQ of 103. Her intelligence is 

A. superior. 

B. genius, 

C. average. 

D. none of the above. 

The use of none of the above might penaliae the good student who 
knows “too much.” Instead of answering (C), he might think the 
teacher wants (D), because IQ is not necessarily equivalent to in- 
telligence and therefore on the basis of a test one cannot classify 
Marj^’s intelligence. If, on the other hand, we substitute beicns average 
for none of the above, the student is forced to make a decision, and 
knowing that an IQ of 103 in the general classification sj'stem is 
considered average, he will respond appropriately and save his dissent 
or criticism of the hem for later discussion. 

9- Accentuate the positive and eliminate the negative. In a learning 
situation it is much better to emphasize the positive than the na- 
tive aspects of learning. There are times, howe\‘cr, when the only 
thing one can do is to have students look for an option that does 
not relate or follow from a given principle. Negative statements used 
sparingly are acceptable. The problem b, howe\-er, that you may 
be reinforcing learning of erroneous concepts; it b also confusing 
to students who are geared to looking for correct rather than incor- 
rect responses. If you must use a negative word in the stem it should 
be set off from the sentence by capitalization or underlining. For 
example, 

According to the United States Constitution, citizens do NOT have 

the right to 

A. free assembly. 

B. freedom of religion. 

C. make arrests. 

D. advocate sedition. 

10. Develop your test according to your educational objexrtives and 
the rules of good test ranstruction. Thb last suggestion is obvious but 
nevertheless needs to be said. Remember that “ideal” test, follow- 
ing the su^estions that have been made, b meaningless if it does not 
measure what you want. This means that you must judge the skills, 
understandings, facts, and other factors you consider important 
within the structure of good test construction- A test ideal from a 
test maker’s point of ^^ew may be poor from a teacher’s perspeciive- 
The important thing to do b to combine both so as to have the best 
possible instrument to help you help your students in their educational 
progress. 



the objective test 

Mechanical Operations 

4ctettTer™s\e'’M 

Item Writing 

Prof^iOMl item writers use separate S- by S-inch cards on which they 
wnte tbeir items. This procedure may be of help in putting together the 
final poduet. At the bottom of the card, list the area and sliUs tested by the 
Mm. The intended answer key (A, B, C, or D) should be svritten on the back 
or the card. 


y4ssem6/y 

You now have a group of items that need to be integrated into a test format. 
Check the items against your course objectives and make sure you have 
covered all the areas you consider important. Be sure you have enough items 
to make your test reliable. 


Editing and Arranging 


Be sure that your items follow the suggestions for good item construction. 
Check your English usage to see that it is grammatically correct and that you 
have not inadvertently misspelled or omitted a word. Next, see if you can 
arrange for a colleague to read over your test and attempt to answer the 
questions. Incorporate his suggestions into your test if they seem appropriate. 

After you have edited your examination arrange the questions in order of 
difficulty. This can be done by establishing categories representing degrees 
of difficulty — for example; very hard, hard, average, easy, very easy. The 
distribution of item difficulty should be similar to the normal cun-e dis« 
tribution. That is, there should be a few very hard and a few vrry easy 
questions. The rest of the items should be of moderate difficulty. There is 
no point in having items that no students can answer or items that everyone 
answers correctly. Before you administer the test, of course, you can only 
estimate the difficulty of items; however, as we shall discuss later, an item 
analysis of the test results tvill establish the degrees of difficulty more 


accurately. , 

After you have ascertained the degrees of item difficulty your next step 
is to arrange them in the ortler that they will be presented to the student. 
First, be sure not to mi.r different fonrats-that is, all multiple-choice 
items should be iit the same section, all matching items should be in the 
same section, and so forth. Second, within each type of item, arrange your 
problems by degree of difficulty, starting with the reo' r»p- and ending urth 
the teoi w items. It is very important to do this because some students. 



408 


TEACHER-MADE TESTS AND GRADES 


Poor: Mary has an IQ of 103. Her intelligence is 

A. superior. 

B. genius. 

C. average. 

D. none of the above. 

The use of none of the above might penali 2 e the good student who 
knows “too much.” Instead of answering (C), he might think the 
teacher wants (D), because IQ is not necessarily equivalent to in- 
telligence and therefore on the basis of a test one cannot classify 
Mary’s intelligence. If, on the other hand, we substitute below average 
for none of the above, the student is forced to make a decision, and 
knowing that an IQ of 103 in the general classi6calion system is 
considered average, he will respond appropriately and save his dissent 
or criticism of the item for later discussion. 

9. Accentuate the positive and eliminate the negative. In a learning 
situation it is much better to emphasize the positive than the nega- 
tive aspects of learning. There are times, however, when the only 
thing one can do is to have students look for an option that does 
not relate or follow from a given principle. Negative statements used 
sparingly are acceptable. The problem is, however, that you may 
be reinforcing learning of erroneous concepts; it is also confusing 
to students who are geared to looking for correct rather than incor- 
rect responses. If you must use a negative word in the stem it should 
be set off from the sentence by capitalization or underlining. For 
example. 

According to the United Slates Constitution, citizens do NOT have 

the right to 

A. free assembly. 

B. freedom of religion. 

C. make arrests. 

D. advocate sedition. 

10. Develop your test according to your educational objectives and 
the rules of good test construction. This last suggestion is obvious but 
nevertheless needs to be said. Remember that the “ideal” test, follow- 
ing the suggestions that have been made, is meaningless if it does not 
measure what you want. This means that you must judge the skills, 
understandings, facts, and other factors you consider important 
within the structure of good test construction. A test ideal from a 
test maker’s point of view may be poor from a teacher’s perspective. 
The important thing to do is to combine both so as to have the best 
possible instrument to help you help your students in their educational 
progress. 



the objective test 
Mechanical Operau'ons 


creative, it is a necessaij- pan of test production. 


Item Writing 

Pmfesioml item j^iters use sejaram 5- by S-inch car* on rfrei (ftey 
mite their items. This procedure may be of help in putting together the 
final product. At the bottom of the card, list the area and skills tested by the 
Item The intended anstver key (A. B, C, or D) should he written on the back 
of the card. 


Assembly 

You now have a group of items that need to be integrated into a test format. 
Check the items against your course objectives and make sure you have 
covered all the areas you consider important. De sure you have enough items 
to make your test reliable. 

Editing and Arranging 

Be sure that your items follow the suggestions for good item construction. 
Check your English usage to sec that it is grammatically correct and that you 
have not inadvertently misspelled or omitted a word. Next, see if you can 
arrange for a col/eague to read over your rest and attempt to answer the 
questions. Incorporate hts suggestions into your test if they seem appropriate. 

After you have edited your examination arrange the questions in order of 
difficulty. This can be done by establishing categories representing degrees 
of difficulty — for example; very hard, hard, average, easy, very easy. The 
distribution of item difficulty should be similar to the normal cune dis* 
tribution. That is, there should be a few very hard and a few very easy 
questions. The rest of the items should be of moderate difficulty. There is 
JJO pcinS in having items that jio students can ans\%’er or items that everyone 
answers correctly. Before you administer the test, of course, you can only 
estimate the difficulty of itesns; however, as w'c shall discuss later, an item 
analysis of the test results will establish the degrees of difficulty more 
accurately. 

After you have ascertained the degrees of item difficulty your next step 
ia ro arrange them in the order that they will be presented to the student. 
First, be sure not to nrii different formats— that is, all multiple-cho.ee 
items should be in the same section, all matching items should be in the 
same section, and so forth. Second, rrithin each type of item, arrange your 
problems by degree of difBcuity, starting with the reryeasy and ending with 
the wry hard items. It is very important to do this because some students. 



410 


TEACHER-MADE TESTS AND GRADES 


especially the younger ones, may be discouraged by difficult problems in the 
beginning and consequently either not finish or not do their best. This 
arrangement is also helpful to students who work slowly. These students 
may know the answers to the easier questions but in a timed test may never 
reach them. If possible, also try to have topics follow in logical and coherent 
blocks of questions. Do not present, for example, an item on the Con- 
stitution, then one on the Bill of Rights, then one on the Constitution, and 
so forth. It is much more desirable to group the Constitution questions 
together and then the Bill of Rights questions together. This presents whole 
blocks of subject matter together and allows the student to attack the prob- 
lems from a common frame of reference. (If you arrange according to groups 
you will then, of course, place the items within groups from easy to hard.) 

Directions 

The detailed nature of test directions will depend on the grade and test 
sophistication of your students. Explicit and understandable directions are 
extremely important. You may present your written directions on the front 
page along with a place for name, date, class, and any other data you deem 
necessary. 

The directions should be completely clear on the exact method to be used 
,in recording the answers. In addition, the student should be told of the 
proposed scoring procedure and credits to be allowed for each section or 
item of the test. 

The following is an example which may be adapted for most objective tests. 

Directions : This is a fifty-minute test. There are seventy-five ques- 
tions. Mark only one answer to each question. Mark the answer you 
think is correct (see example below) by draw’ing an “X” through the 
letter that best corresponds to the correct answer for each problem. 
Do not make any marks on the test. Make all of your marks on the 
separate answer sheet. 

Do not waste time on questions that seem too difficult. It is better 
to go on and finish the test and then go back to the difficult questions. 
Remember it is better to make a careful guess than to spend too much 
time on any one item or to leave an answ'er blank. Your score will be 
based on the number of correct answers you mark. 

Example : 

Test Problem Answer Sheet 

The first president of the United States was 

A. Abraham Lincoln 

B. George Washington ^ 

C. Benjamin Franklin A C D 

D. Thomas Jefferson 


Table 27 presents an example of a teacher-made answer sheet. 



THE OBJECTIVE TEST 


Table 27 part of a Teacher-Made Answer Sheet 


Name 




^=1 






Directions: Mark an X through the letter that corresponds to the 
best answer in the following wayj 

Example: A ^ C D 

Remember, mark only one answer to each question. 


Question: 

1. 

2. 

3. 

4. 

5. 

Answer: 

A B C D 

A B C D 

A B C D 

A B C D 

A B C D 

Question: 

25. 

26. 

27. 

28. 

29. 

Answer; 

A B C D 

A B C D 

A B C D 

A B C D 

A B C D 



Physical Layout 

Remember, not only should your whole test, questions end ianguzge, be 
geared for student understanding, but also the physial layout should be 
arranged for your convenience m scoring. Here are some simple suggestions 
that will help you and your students: 

1. Test material and all items pertaining to it should go on the 
same page. Do not, for example, have the test items that refer to a 
chart on a separate page but include the problem, items, and chan on 
the same page. Do not farcah up an item by having the stem and one 
or two options on one page and the rest on the next page. 

2. Arrange to have each multiple-choice option on its own line. 

3. Arrange your answer sheet (see Table 27) so that the answers 
are all in one column. This helps if you score by stght and is also 
convenient if you use a scoring key which can be placed beside the 
column. 


Reproduction 

You now have before you the edited and arranged questions, directions, 
and answer sheet. Now type a rough draft to see hoiv it looks. Decide on 
your space reeiuiremcnts and other neossary mechanical problems. After 
you have done this, reproduce the test, using a mimeograph or any other 
process that is available to you. A good rote of thumb is to produce at least 
ten extra copies for unexpected needs. Nat dopticate jour answer shceu. 


412 


TEACHER-MADE TESTS AND GRADES 


Scoring 

Prepare your scoring key by marking the correct answers on an answer 
sheet. Although you could stop at this point and score the tests by com- 
paring the key to the student’s answer sheets, you should spend a little more 
time now’ to save your eyes and time in the future- A good method to accom- 
plish this is to produce a scoring stencil. The scoring stencil »s simply an 
answer sheet with the correct answer punched out; it is placed over an 
answer sheet, thereby making it possible to count the correct responses. 
There are many ways of constructing a scoring stencil. One method is to 
paste a correctly marked answer sheet on a piece of cardboard or a group of 
five or six papers pasted together, and punch out holes in the marked spaces. 

In a large school system you may have an electrical scoring machine such 
as the IBM 805, which will save a great deal of time. In that case you wll, of 
course, use standard IBM answer sheets similar to those used with stand- 
ardized tests. You will then only have to record the correct answers on an 
IBM answer sheet using an IB^I pencil or with some machines, a number 2 
pencil. This stencil is then placed in the machine. 

Item Analysis 

Many teachers think that after they have developed, produced, admin- 
istered, and scored a test they are through. This is not the case at all. If you 
stop after scoring you will be losing much valuable data about your test and 
student reactions. Test analysis will give you clues to the achievements of 
your class and will aid in future teaching objectives. It will also tell you 
about the weaknesses of your test and help you make improvements in 
future examinations. 

The discussion that follows will be brief because the methods for analyzing 
test results were already presented in the chapter on achievement tests 
(Chapter 10). The same methodology of item analysis in standardized 
achie\’cment tests, for the most part, can be used with your classroom tests. 

"iou will first want to ascertain the degree of difficulty of your items. 
This can be done in a small classroom (less than fortj’ students) by selecting 
the top ten and lowest ten students according to their test scores.* For each 
of the ten top students go through the test and mark a 1 next to each response 
chosen. Do the same for the lowest ten using a different mark, such as an 
X. For example, note the following history item and the responses of the 
top and lowest ten. 

I rhe first president of the United States was 

j A. Abraham Lincoln 

J XXXXXXXX llinillll B. George Washington 

\ X C. Benjamin Franklin 

i ^ D. Thomas Jefferson 

* For the most rel^Ie and theoretically correct method which would be needed 
for larger groups ace Chapter 10. 



THE OBJECTIVE TEST 


413 


An inspection of our example reveals that all the top ten students chose 
the correct answer and eight of the low-est ten also responded correctly. 
This data, although based only on a sampUag of the class, would be very 
valuable to the teacher. On the basis of our example the teacher might 
decide that the easiness level is too high and that the item does not dis- 
criminate between good and poor students. For larger groups of students 
and for tests of 100 questions or more, mechanical methods such as scoring 
and punch cards rvould need to be used. 

In analyTsing data you will look for questions that no one answers correctly 
or that receive the same number of correct responses from “poor” students 
as from “good” students. The question or the options may be poorly stated. 

If you are using a test primarily to rank your students for grading pur- 
pfKcs, your questions (ideally) should be so difficult that on a five-option 
multiple-choice item only 60 per cent of the group obtain the right answer, 
on a four-option multiple-choice item 62 per cent obtain the right answer, 
and on each true-false item 75 per cent mark the correct response (Educational 
Testing Ser-'icc, 1961). 


A Case History 


The following ase history is presented as an illustration of tome of the 
concepta, procedt/res, and problems in objective teat construction that face 
the teacher. In many ways it is a synthesis of our previous discussion placed 
in an actual school setting. 


An Objective Achievement T«t on Fifth-Grade Arithmetic 
It was near the end of the school year Mrs_ Jaebon 
teacher, decided to give her pnpds an ' 3 1 

year’s i-ork. Her first step was to hst the 

hoped to get from the test. She :„diation of 

to get a general pielure of ttes Secondary purposes she 

over-all areas of strength and be especially weak 

listed were (1) to Identify '['Ijf' ‘’^mi^easure the relative abilities 
in a particular arithmetic skill and ( 2 )^»mea 

of her students for purposes of r^o class achievement, 

In trying to get an accurate p the 

she decided that there were wo computation regutred, 

year’s wotkt one was ^ problm tea presenled. 

and the other was according to /w«»9 F 

■Prom Making Ihi Ctomom ^ 

Service Evaluation and Advisory Q,rtMid Edition 1961. Reprv* ^ 

19S9 by Educational Testing Service. Secon 
of Educational Testing Service. 



TEAC2ffiR*MADE TESTS AND GRADES 


The kinds of computation required were 

1. Multiplication 

2. Division 

3. Addition and Subtraction of Fractions 

4. Measuring (distance, time, weight, temperature, etc.) 

5. Decimals 


The ways the problems were presented were 

1. Simple computation, such as 2I)Sl-05 or 1/2 + 1/3 

2. Problems requiring use of procedures learned previously, such as 

John missed 1/5 of the twenty words on a spelling test. 
How many words did he miss? 

or 


A group of twenty-nine children were making programs for a 
school assembly. They needed 435 programs. How many did 
each child have to make? 

3. Problems requiring original thinking by pupils and use of 
number sense.” In these problems the pupils could not depend 
on previously learned procedures for a method of solution but 
must develop their own procedures for solution. Two problems 
of this tj-pe follow: 


Problem one. Explain how you, as a fifth-grade pupil, using 
ten blocla, could prove to a fourth-grade pupil that 1/2 is 
bigger than 2/5. 


Problem two. ou are standing directly in front of Building A 
and looking off at Building B in the distance. Here is the way 
the two buildings look to you: 





the objective test 


41S 


The rooms in both buildings are the same height. By Jookin? 
at the wndou-s, decide which of the fojfowng is truer 


(A) Both buildings are the same height. 

CBj BuiWing A h t\ro-tftircfe as high a? Building B. 

(C) Building A is one and one-third times as high as Building 


(D) Building A is twice as high as Building B. 

(E) You can't tell from looking at the buildings which one 
is higher. 


(Ansiver; B) 

Using these two ways for classifying the questions (according to the 
Idnd of computation required and according to the way the problem 
was presented), Mrs. Jackson was now ready to make a tcritten plan 
for the test. She intended that this plan would provide for a test 
paralleling the emphasis given to various points in class. 

Mrs. Jackson uTote out her test plan in the form of a "two-way 
grid." In a two-way grid each question is classified in two dimensions. 

The two-w’ay grid that Mrs. Jackson made for the arithmetic test is 
on page 0. Since she planned to allow an hour for the test, she thought 
40 questions would be about the tight number. The numbers in the 
boxes represent questions — these questions to be of a type described 
by the two dimensions of the grid. 


Way Problem Presented 


IC/ad of 

Computation 

Peqttired 

Routine 

Computa- 

tion 

Thought Problems 
Following Proce- 
dures Taught Pre- 
viously 

Thought Problems 
Requiring Stu- 
dents to Develop 
New Procedures 

Fractions 

7 

4 

1 

Multiplication 

2 

3 

I 

Division 

3 

4 

1 

Measuring 

1 

5 

1 

Decimais 



1 


After Mrs. Udkson completed the t\vo-^yay grid, she louno u res- 
tively easy to write most of the questions for the test. She was able to 
write many questions by paralleling questions from the anthmet.o l«t- 
book itself. However, she found it quite chiltenpng tu wnte the fire 
problems whiob would require students to develop new proeedures. 



416 


TEACHER-MADE TESTS AND GRADES 


Mrs. Jackson believed that the test covered understandings and 
skills in which her pupils had been well prepared. Therefore, she 
expected the very best students to get all or nearly all of the questions 
right, and she expected even the below-average students to get a 
majority of the questions right. She did not, however, make the 
mistake of deciding in advance that some minimum score — say 28 
questions right (70%) — would represent a passing mark. She knew 
from previous experience that sometimes her questions turned out to 
be more difficult than they first seemed to her. She decided to wait 
until she could look at the scores actually made on the whole test and 
could scrutinize carefully any questions which proved particularly 
troublesome. 

As it happened, most of the students did well on the test, although 
no one had a perfect paper. On the basis of the test, Mrs. Jackson felt 
that her class had achieved the objectives of the work in arithmetic. 
She did notice, however, that a number of students had difficulty 
with the problems involving decimals. Therefore she decided to spend 
more time working on decimals in the few weeks remaining in the 
school year. And then there w'as one student who failed all the 
division problems, although he did fairly well on the rest of the test. 
She arranged to give this student special help in division. 

Most of the students had between 30 and 35 questions right. 
Howe^•e^, there were a few who scored above, and a few who fell 
below this middle range. Knowing which students were in the middle 
and which were above or below was useful to Mrs. Jackson in assigning 
report-card grades. Of course she also took into account each pupil’s 
class work and his standing on other tests. 

In evaluating her test, Mrs. Jackson felt it had been reasonably 
successful in meeting the purposes for which she had planned it. 
The test had given her a good picture of over-all class achievement 
and it had pointed up the weakness in decimals. It had not been 
planned to be highly diagnostic, but it had helped to identify one 
pupil who was especially weak in division. In addition, although the 
test did not rank all of her students in the exact order of their arith- 
metic abilities, it had pven her information that was useful for 

j grading purposes. 

References 

Educational Testing Sendee. Making Ike classrmm Uit : A guide for teachers. Evalua- 
tion and Advisory Senice Series, No. 4. {2nd ed.) Princeton, N.J.: Educational 
Testing Service, 1961. 

Educational Trating Service. MuUiple-choUe quettioni : A close look. Princeton, 
fs.J.: Educational Testing Service, 1963. 

measurement and evaluation. New York: McGraw- 



CHAPTER 

IS 


Grades and Report Cards 


Thus far we have primarily devoted our discussions to instruments that 
attempt to appraise student progress clyfctkviy. In this chapter our attention 
will be directed to the assignment of labels to test results and other evidence 
of pupil achievement. The best-constructed tests ^rilJ be rendered useless 
if after reviewing the results the teacher assigns grades without a structured 
and coherent plan. Grading is subjective, but this subjectivity may be kept 
to a minimum by appropriate grading practices. 

The assignment of report card grades is a difficult task. However, if a 
logical and rational system is followed, these tasks berame less difficult. 
In addition, if information concerning a student’s ability and progress is 
reported in understandable terms, fewer parents will become irate. 


Philosophy 

The individual teacher must think about grades in the context of teirning 
and reality, must formulate a philosophy aod appi^ch to grades an^ 
cards and then act accordingly. In detotmining ^ 

dealing tvith the effects of grading oo leamtng and niM»t.on most be 

‘ <17 


•418 


TEACHER-MADE TESTS AND GRADES 


answered. These philosophical conclusions must then be translated into the 
context of the school setting and its own unique demands and realistic 
problems. If, for example, you decide that grading is inaccurate and un- 
reliable and the school policy is directly in conflict with your ideals, do you 
completely capitulate or do you, on the other hand, go on strike and refuse 
to cooperate with the school administration? Or do you work within the 
system and attempt to modify existing practices? 

This chapter is based on the last approach — that grades are with us, like 
it or not, and our job is to make them as accurate and effective as possible. 
The teacher who is against grades can in the classroom place more emphasis 
on the rewards of learning and less on the importance of grades. The teacher 
can take part in teacher committees formed to review and modify the existing 
grading policies. Thb approach incorporates indindual philosophy within 
the reality of the school setting. 


Purposes of Grades 

Remmers, Gage, and Rummel (1965) list eleven common purposes of 
marking systems: 

1 1. Information for parents on pupil status or progress 

2. Promotion and graduation 

3. Motivation of school work 

4. Guidance of learning 

5. Guidance of educational and vocational planning 
(}. Guidance of personal development 

7. Honors 

8. Participation in many school activities 

j 9. Reports and recommendations to future employers 
I 10. Data for curriculum studies 
j 11. Reports to a school the pupil may attend later [p. 288] 

In renewing the preceding eleven points the teacher who is adamantly 
opposed to grades must admit that they do ser\’e many functions. Some of 
these purposes, such as “motivation” and “guidance of learning,” may 
possibly be discarded in an ideal situation where students learn “for the 
sake of learning,” but some of the other purposes would be difficult to fulfill 
without grades or another tj'pc of evaluative device. 


Assigning Test Grades 

Students need to know how well they perform on a classroom test. They 
must know the areas of their subject-matter strength and weakness. This 



grades and report cards 

419 

can be accomplished hj. going orcr the test, item by item, and all„«i„„ ,he 
tudents to ask nucstions. The teacher and his students can together decide 
whete certain competencies are lacking or ivhcther there nas a misunder- 
standing of the test question. The teacher may decide that a test item rms 
poor and the student’s "incorrect" response might be considered correct. 

Students also need to know what their final scores mean. Not many 
students in our American schools will be satisfied to know only how many 
questions they answered correctly. They want an eealuation, "Is my score 
good, fair, or bad i" Let us look at some possible approaches. 


Percentages 

A traditional approach in assigning test grades is arbitrarily to decide that 
all students who answer 92 per cent of the questions correctly receive at\ A; 
those who answer 8S per cent correctly receive a B; 75 per cent correctly 
receive a C, and so forth. This type of assignment dots a grave injustice to 
the objective test, because many easy items would have to be included to 
ensure that the average pupil obtained a score o\xr 75 per cent. In our 
earlier discussion of the ofajeettVe test it was stated that a test that had too 
many easy or too many hard questions was a poor test. Thus to follow the 
percentage system the teacher would have to go against one of the basic rules 
of objectii'e test development. It is, therefore, strongly recommended that 
this type of grading not generally be used with objective tests. 

The use of this grading sprem in essay tests, however, is not as inappro- 
priate as it is with objective tests. The percentage assignment is only another 
method of conveying the teacher's evaluation of the paper. It should be 
mentioned, however, that a grade of 86, for example, might be confusing to 
the student, because in essay examinations the criteria for scoring are not 
altt-ays evident. 


Grading on the Curve 

Grading on tho aim (htOTtrally assumes a given number of grades, 
A through F which are derermlned by the relative positions of the students 
in comparison to one another. For evaraple, note the followmg distribution: 


Grade Pereentage of Students 


A 

B 

C 

D 

F 


10 

IS 

50 

15 

10 



418 


TEACHER-MADE TESTS AND GRADES 


answered. These philosophical conclusions must then be translated into the 
context of the school setting and its own unique demands and realistic 
problems. If, for example, you decide that grading is inaccurate and un- 
reliable and the school policy is directly in conflict with your ideals, do you 
completely capitulate or do you, on the other hand, go on strike and refuse 
to cooperate with the school administration? Or do you work within the 
system and attempt to modify existing practices? 

This chapter is based on the last approach — that grades are with us, like 
it or not, and our job is to make them as accurate and effective as possible. 
The teacher who is against grades can in the classroom place more emphasis 
on the retvards of learning and less on the importance of grades. The teacher 
can take part in teacher committees formed to re\'iew and modify the existing 
grading policies. This approach incorporates individual philosophy uithin 
the reality of the school setting. 


Purposes of Grades 

Remmers, Gage, and Rummel (1965) list ele^•en common purposes of 
marking systems: 

1. Information for parents on pupil status or progress 

2. Promotion and graduation 

3. Motivation of school work 

! 4. Guidance of learning 

5. Guidance of educational and vocational planning 

6. Guidance of personal development 

7. Honors 

, 8, Partidpation in many school actidties 

I 9. Reports and recommendations to future employers 
I 10, Data for curriculum studies 

j 11, Reports to a school the pupil may attend later [p. 288] 

In ^e^•ie^ving the preceding eleven points the teacher who is adamantly 
opposed to grades must admit that they do ser\'e many functions. Some of 
these purposes, such as “motivation” and “guidance of learning,” may 
possibly be discarded in an ideal situation where students learn “for the 
sake of learning,” but some of the other purposes would be difficult to fulfill 
without grades or another type of evaluative de\'ice. 

Assigning Test Grades 

Students need to know how well they perform on a classroom test. They’ 
must know the areas of their subject-matter strength and weakness. This 



419 

grades and report cards 

can be accomplished by going “\t^u/enB cm together decide 

students to ask questions. The ether there teas a mtsonde 

whether certain competencies J aecidc that a 

standing of the test quMtton. The t considered corr ^ 

poor and the student s incorrec g„jl scores mean. “ J 

Students also need to Tsatisfied to know on y hmv ma y 

students in our Arnmiem ^ w». an e— 


Percentages , mdcs is arbitrarily ■“ deeide^a' 

A traditional approach m .^e quea'l"”* “"“''f correctly 

receive a C, and fneO-'/h^ W sro<ii^‘ cent. In our 

the objective test, ‘>«an=""fct,si'ed a ,hst a test that had too 

ensure that the average P P test it was ,'51. Thus to follow the 

.... ...i.i, oh ective tests. .u,„on ol tnc u r fesme to 


'he useofthisgraon'B;;; The .he paper. It snou u 

riate as it is with ohjeo'‘« “Xr’s evaluation of ft P ^c confusing to 
So“of conveying *%Ttfd"of 86.'“ r«!:;riair scoring are net 
aentioned, however, hat aj examinations 
he student, because in essay 
ilways evident. 

Grading on the Carre , assum'S “ ^wJricTo'f ^ 

in comparison to on 



420 


TEACHER-MADE TESTS AND GRADES 


This form of grading, which presupposes that achievement will be 
normally distributed in a given classroom and that, therefore, a certain 
number of A’s, B’s, C’s, D’s, and F’s must be given in relationship to the 
normal curve distribution, is z. prostitution of statistics and a poor and unfair 
way to grade. The classroom does not contain enough pupils for the assump- 
tion to be made that there is a normal distribution of students.' A much 
greater number than the usual twenty to forty pupils per class would be 
needed. Thus, the teacher may have a bright or dull class which is not at all 
representative of different levels of ability. If the teacher follows the curve in 
this setting, some children in the bright group would be doomed to failure 
and some children in the dull class would have to receive A’s. Promotion 
and test grades would then be linked to the chance of what group a student 
found himself in. If an average pupil could arrange to be in the dull group, 
he probably would graduate in the top of his class. The use of the curve in 
the upper-middle-class high school and college and graduate school presents 
a similar problem. In these settings there tends to be a skewed curve, that is, 
students at the low end may be in the bright range of ability in the general 
population and students at the top in the very superior classification. If 
curving was adhered to completely in such situations, bright and superior 
students might have difficulty in graduating. Thus in a statistical sense the 
curve is an erroneous method of evaluation because the groups are too small, 
and in selected groups there is not a representative distribution. 

Stanley (1964) in discussing the problems of “grading on the curve” 
cites an amusing story: 

On the first day of class a graduate professor of Latin informed the 
seven students taking his advanced course that he had learned of 
grading on the curve the previous summer and would use it in this 
class, resulting in certainty that one of the seven students would fail 
the course. As the students left at the close of class, one of them 
muttered to the other six, “I’m sure to be the one who fails, so I’m 
dropping the course right now.” 

But you can’t do that,” the others exclaimed, “because we don’t 
know which one of us would fail then.” 

So the six pitched in and paid the predestined failure to stay in 
the course and absorb the failing grade [pp. 326-27]. 

In addition to its statistical limitations, the curving of grades leads to 
undesirable attitudes. The quest for grades and the competitiveness among 
students is greatly increased. The typical student cannot help but hope that 
his friends will do poorly so as to enhance his own grade. 

Still another drawback lies in the position in which it places the teacher. 
If she assists one student to do better on a test, she is automatically relegating 
another pupil at least one slot back. In some cases this may mean helping 
one student to pass at the expense of another’s failure. 



GRADES AND REPORT CARDS 421 

Lindvall (1961) examines the problem of the distribution of test scores, 
stating that the assumptions based on the normal curve: 

involve about as many subjective judgments on the part of the teacher 
as does any other grading procedure. The teacher must use his judg- 
ment in constructing a test he feels should produce a nomal distribu- 
tion of scores if given to a large group of pupils. . . . This assumption 
can never really be given an exact check. The fom of the Jstribu- 
tion of any set of test scores depends on the difficulty of the itcrns 
included in the test as well as on the ability of the students, and it is 
doubtful that many teachers consistently produce tests that yield ap- 
roximately normal distributions. As a result most teachers using the 
normal-curve method of grading are probably using it with measures 

that are not distributed normally. nnt the 

Even under ideal conditions the normal curve method « not t 
objective »nd seientific procedure for asr.gmog grades that some 
pereons assume it to be [p- 256]- 

our discussion of Vading on 

in dassroom evaluation. 

Absolute Standards 

The teneber soys. ;ibsve^sun«s.d^eeH^^^^^^^^ 

discipline, but the ™"*'' “Llute standards when dealing wrth 
not very large. It „ „„„ps differ. Not many teachers would 

children because chddren as well a p p 
give all Fs or all A’s; nor would principals p 

The Ansn-er ! . . „„des on an examination. 

There is no easy solution to jJ’' Jnibmatbn of some onhe 

The fairest approach seems » b' “ " ,®„d,ris should be tempered by 
approaches already -JXt A given amount -f 

the performance of the class “ “ ^grades will be decided 

be learned, and rvithm this This may sound like ot _ g. 

relative merits of the entire ^ ' students to one another. It is 
and it is in the sense of co^nng pur- 

sense of predetermining ^ one must unit. 

In ‘letemmmg_gradM instructional objccti 


poses of the test i 



422 


TEACHER-MADE TESTS AND GRADES 


These goals will weigh heavily in the consideration of grade categories. For 
example, you are teaching arithmetic in fourth grade. One of your basic 
objectives is to teach children the multiplication tables. A passing grade in 
fourth-grade arithmetic means partly that a child has learned how to multiply. 
In this situation an absolute standard in grading is necessary. You cannot 
certify that a child has learned how to multiply on the basis of his class 
ranking in a given test or on an arbitrary percentage of the questions answered 
correctly. Grading must be in line with your educational objectives. 

In another situation where there is more latitude in objectives, the teacher 
may want to allow the B, C, and D grades to be distributed according to the 
performances of the students. There will, of course, not be a predetermined 
number of grades. The A and F grades may entail the classical percentages 
approach, allowing these percentages to be relevant to the specific educational 
objectives. In an objective examination the minimum level for a passing 
grade should probably be kept between 40 and 45 per cent, because even 
poor students will be able to guess some of the answers. 

Individual test grades should be a device to convey relative performance. 
Of course, the student’s test scores throughout the semester or year should 
be averaged. The final results may be quite difTercnt from an assigned grade 
on one test. 

In awarding test grades what is important is the relationship of these 
symbols to overall educational concerns and objectives. Combining these 
with the most appropriate mechanical means for assignment of grades will 
produce the most equitable evaluation. 

Assigning Report Card Grades 

The teacher has prepared the educational unit, taught the lessons, and 
measured the outcomes. Now final grades must be assigned. What factors 
should be considered? Should classwork as well as tests be counted? How 
should different sources of evaluation be weighed? 

Keeping in mind the educational goals, the teacher must decide what 
evidence will indicate the accomplishment of these goals. A decision must 
be made concerning the relative weight to be placed on different types of 
data, including examinations, quizzes, papers and reports, and classroom 
participation. 


Examinations and Quizzes 

All the student’s raio scores, not his test grades, on the objective tests and 
quizzes he has taken are totaled. For example, Bill Ross, a student, has 
obtained scores of 40 and 55 on the tw'o objective tests. Each test had a total 
of 60 items. The maximum possible score that Bill could receive would be 
120. HU overall objective test score is 95. The same process is used for 
grades obtained on quizzes. Each student*s scores are tallied in the same 



424 


TEACHER-MADE TESTS AND GRADES 


the scores to a common denominator or numerical scale. If, for example, all 
our data are to receive equal value in determining the final grade, we can use 
the method described In the section on papers and reports. Let us look at 
BUI Ross’s record as an example of this method: 


Average Scaled Scores 


Objective 



Final 


Quizzes Tests 

Papers 

Reports 

Average 

Grade 

2.4 3.0 

3.3 

2.0 

2.6 

c+ 

- .n 


In the preceding example the quizzes and objective test grades were assigned 
the same values as papers and reports, that is. A=^ 4, B = 3, C = 2, and 
so forth. These sources of evaluation were given equal weight to simplify 
the example. If we were to decide to weight different areas differently, the 
same basic process could be followed but more value would be given to the 
ar^ of most importance. For example, if the quizzes and objective tests were 
to be counted as worth two times as much as the other factors, their value 
\vouId become 4.8 and 6.0. respectively. (We would then, of course, divide 
the total by 6 rather than by 4.) 


Report Cards 

The great majority of report cards contain both grades and a list of check 
I erns, y in the elementary grades. The report card, no matter what 

n™. ’• achievement separate from effort, 

n^tOKs, citizeriship, behanor, and so forth. The report ^d is an important 
Tif <l«et'-ing of the best efforts to proride understandable 

other "How does my child compare to the 

hU beSrio™ h' friends "^ilyf -What about 

nis behaMor in the classroom?” ^ 

The local school 

S remeXr f.T'; • Th' important thing 

and narents Ti, *^mmunication between the teacher 

and parents. The meaning of the gmdes and symbols used should be clear. 

Evaltiation and Reality 

as w®”"® r *'■' f'”- measurement, and. 

irrades ttri, t u cannot cxpect perfect tests or 

grades. What can the teacher do m light of the tenuon^ess of testing and 



GRADES AND REPORT CARDS 


425 


grading? The answ er is easy. The teacher should avoid 
too seriously. Nunnally (1964) states this quite well: 


taking tests or grades 


Teachers, students, and parents should learn to take the resulls 
uom one test, and even the final grades from a whole term, tvith 3 
large gram of salt. Such grades should be considered as only highly 
tentative indications of the student’s liaslc abilities, his application to 
schoolwork, and his attitudes toward iearnias. Bad grades during one 
term may correctly spell trouble for the future; or they may equally 
well mean that the teacher was biased in grading the student, that the 
tests were poorly constructed, that the teacher has unreasonably tough 
standards for grading, or that the student is going through a “phase” 
which he will outgrow later Ip. 164). 


On the whole, most teachers do a pretty good job of eraluafion. The 
dramatic exceptions cloud the picture. Think back to your own student daj-s 
in elementary and high school. How' many times did you think you were 
unfairly graded? How about evaluation in college? Probably your college 
grades do not seem as valid as your elementary and high school evaluations. 
If that is true maybe it is because the college professor does not have as much 
data upon which to base his evaluation-daily recitations, class projects, 
observations and $0 forth. 

The important thing to remember in your eraluation is to attempt to 
minimize personal bias or prejudice in awarding grades. At the same time 
one must face reality and admit to students and parents that grades are only 
an attempt to evaluate performance. Do not be afraid to admit that there is 
not always a great deal of difference between a C and a B. Howeter, you will 
usually be confident of the validity of the extremes. That is, an A is awarded 
on very evident criteria, as is the awarding of an F, cspcciaffy when if means 
a student will be held back a semester or year in his school life. Trj' to 
emphasize learning and de-emphasize the imponance of grades. Follow your 
grade evaluations with either oral or written descriptions of student progress, 
especially tvith children who are experiencing learning difficulties. 


References 

Lindvall, C. M. Ter/m^ and eraluatiott: An introduction, ^’c^v York: Hircourr, 

C.^SdwaJmalmetnuremmt andnalualion. New York: McGrawHiH. 

H H Gaze, N. U and Rummel. J. F. A fraclkol Mfion 
Rcmmera. H. ^ J ' . ,2nd ed.) New York: Harper & Row. 1965. 

s„XTa 

Prcntice-HalJ. 1964. 



PART SIX 

A School Testing 
and Evaluation 
Program 


CHAPTER 

17 


Planning and Using 
a Testing Program 


This chapter deals with the "''[^irgprognToartotussion 

staIdlrdLed^estsinacomprehe..«s^ 

is focused primarily on P“P«-“V, ? " tirade at the same time. Pro 

generally administered to all „,her clinical 

of what constitutes a sound trat gp^^g as fljram roust 

suggestions. Th'se ^“ggesti ^ objectives educational 

goalsofaparticularschoolw.th.nap 

groups of students. guidelines and follona; 

There are, however. of a testing program, a 

the development and .mpleroentatto 

n obiectives of the testing program. 

1. Determine the obj 

2. Select tests that me t * ' administering tbc t«t ^ 

3. Arrange a ^cs^ orienution for teachers. 

4. Arrange for an m 


430 


A SCHOOL TESTING AND EVALUATION PROGRAM 

5. Arrange for a pretest orientation for students. 

6. Administer the tests. 

7. Score the tests. 

8. Record the test results. 

9. Use the test results. 


Determining the Objectives 

Tests are useful only if they supply the school, the teacher, the student, 
and the parent with meaningful information. To administer tests because 
it is fashionable to do so or because the government provides funds for testing 
is an exercise in futility. 

Although it may seem obvious to the reader that tests should be given for 
specified purposes and then used, this is not always the case. Many years 
ago, for example, this wTiter had occasion to be in a school sy’Stem that applied 
for governmental funds for testing. The request was granted and the school 
administered a great many different tests without ever thinking of what they 
wanted to measure. They had received money and they were going to spend 
it all. The natural consequences of this “nonihink" action was a great deal of 
wasted time and cnerg)'. After the tests were administered and scored, no 
one kne\v w’hat to do with them. They were finally stacked in neat piles and 
found their last resting place in the school basement. The government had 
spent money; the teachers had spent time; the students had spent energy; 
and the test publishers and authors were a little richer. These were the only 
results of testing in this school. 


A Testing Committee 

The first step in planning the objectives of a standardized testing program 
is to involve appropriate personnel. Almost all the school’s staff and adminis- 
tration have a vital interest and responsibility in planning the testing program. 
Personnel at different levels willl be able to utilize some or all of the test 
results. 

A standing testing committee is one method of involving staff participation 
in the development of a school-wide testing program. This committee should 
be composed of school personnel who hav'e interest and needs that may be 
ser\-ed by testing, and should represent the special interests of various groups 
within the school. If there is a director of testing or research, he should be 
appointed to give technical assistance. The guidance counselor and/or school 
psychologist can also sen.’e in an advisor)’ capacity. In addition, representatives 
from the school administration and instructional staff will need to participate 
in the committee to voice their special needs and interests. In a large system 
the directors of instruction and curriculum could ser\'e as representatives. 



planning and using a testing PflOGRA^t 

The committee’s tvort should not stop when a teslioo procram has been 

IT •“ a ccotS 

aluatton of the testing program, making changes, modifications, and addi- 
twns as the need arises. 

The involvement of teachers and other school personnel in the planning 
and directing of a testing program lessens the likelihood of misuse of tests. 
Misuse involves not only the filing au-ay of unused test data but abo incorrect 
interpretation of that data. InvoKdng teachers in the planning of a testing 
program may also help keep them from feeling that outside materials and 
unnecessary work are being imposed on them. 


Practical Aspects 

The purposes of a testing program in a given school system should relate 
directly to that system's o^^'n educational goals and needs. First the needs of 
the students, the instructional program, and administrative concerns should 
be surveyed. Research data wiJJ evolve from this process. The school testing 
program should be geared to practical needs, not to uhat a textbook states 
Or what a consultant thinks is in xtigue. Consultants, textbooks, and testing 
experts have their place and can make valuable contributions. Tbeirassistance, 
however, should come after the local school has decided on its own unique 
objectives and practicil needs. Theoretical research dealing with curriculum 
problems or administrative issues are svorthy in themsclies of the school's 
time and energy, but the practical problems of educating children must come 
first. 


Structure 

A practical approach must be developed within a rational and cohesive 
system, which provides tools of assessment and defines the limits of the 
program. In planning a testing program, considerations for administnifron, 
scoring, and reporting results must be based on the availability not only of 
funds but also of personnel. Teachers often complain of delayed feedback 
of test results. Giving tests in the fall and not making the results available 
until spring is a good way to lose the support of the faculty. T he importance 
of reporting test results to teachers as soon as possible cannot over- 
emphasized. It Is much better to have a small program that the school can 
handle than an elaborate one that causes discontent. 

The structure of the program should be definitively geared to specific 
purposes. 7'he program -should state in clear and cogent terms its puriwscs 

Ldrangcofobjcctbcs. For example. 

problems" is xery nice, but to rrhotri^ .. refer? A " g 

be, “to diagnose problems in seventh-grade errrbmetic. Tbur, e speetlic 



432 A SCHOOL TESTING AND EVALUATION PROGRAM 

purpose has been formulated and further action, such as selection of tests in 
arithmetic achievement, may be directly formulated. 


When to Begin Planning 

The testing committee needs to plan for testing well in advance of the 
intended dates of administration because: (I) committees should not be 
rushed in their important deliberations and (2) selecting, ordering, receiving, 
and inventoiying tests is time-consuming. Lennon (1962) in addressing 
himself to this problem states, 

A testing program should be planned betrs-een six months and a year 
in adwince of the time when it is actually to take place, depending 
somewhat on whether it isa spring or a fall program, on local budgeting 
practices, on the size of the system and consequent communications 
and training problems, and similar factors, ^\'hen a major program 
is b«ng contemplated, one that covers many subjects at many grade 
levels with a single batter}', it will ordinarily be desirable to Aink in 
terms of establishing a program that \vill be maintained for several 
years. In giving thought to this type of program, it is w'ell to have in 
mind such matters as the av-ailability of alternate forms, and the possi- 
bility that re\'ised editions of the test in question v.-ill be appearing 
over the period of the proposed program [p. 3]. 

Selecting Tests 

Selecting the tests that meet the school’s objectives and needs is a difficult 
task- There arc many published standardized tests but onlv a small number 
of quriity instruments that will be suiufale for loal ne^. The testing 
committee needs to appraise the intended test not onlv in terms of suitability 
for local purposes but also in the technical sphere of validity, reliability, and 
stand^d error of measurement. These basic concepts are crucial factors in 
selerting tests. In addition, practical problems such as cost and time to 
administer and score the test must be considered. The committee should be 
aware of these criteria in their test selections. The committee member who 
IS the director of testing or research should direct the technical e\'aIu3tion- 
If there is no director of testing and research, then the school psychologist 
and/or guidance counselor should serve in this capacity'. The committee ivill, 
of course, have a\-ailable basic references such as test catalogs. Mental 
rvieasurements Yearbooks, Tests in Print, Psjxhological Abstracts, and 
Standards for Educational and Psychological Tests and Manuals, re- 
viewed in Chapter 6. In the following two sections essential ty'pes of tests 
needed at different levels and the rationale for their ■i<^* will be presented. 


i 

ii 



PLANNING AND USING A TESTING PROGRAM 


433 


Suggestions for a School Testing Program 

The recommendations that follow are based on a general core of test needs 
from kindergarten through twelfth grade. It must, however, be stated once 
again that the local school should develop its own testing program according 
to its own needs and objectives. On the other hand, the local school, no 
matter what its specific needs or problems, should consider certain wsential 
areas in their measurement program. The suggestions that follow are directed 
to these essential elements. 


Kindergarten Through Sixth Grade 
The elementary achool is primarily coneemrf 
basic skills and tools in the essential areas of trading, ‘"8' 

^1Sdbeno.d^nt^^-^— ^ 


READtKd RltSDIN® TCSTS (K-l) ^ 

The reading readiness test '= ^ shortcomings in terns of 

school career. The reading readme hoivevet, there is a great deal 

validity and reliability. Even more P ’ g m ,1^5 first-grade teacher 
of confusion as to ivhat co"a.hu‘ra r radme^^ Sti, , .^he 

needs to know something about her p p ^ up„„ 

readiness test, with all its as the 

which to begin a program of ”,udLs she will make appro- 

teacher becomes belter meantime, howocr, the reading 

priate changes based on new • 

readiness test has served a useful fu 

Reading Tests is stiil the most important 

Marshall hIcLuhan notwithsuntoE^«J;;;8^_^^,,^^, 
learned skill in the education of ao nd d^^ 

the student to acquire other knowlcu^ ^ academic trouble 

;i;L“s.tAouidb^ 

Financial resources permitting. 



434 


A SCHOOL TESTING AND EVALUATION PROGRAM 

year beginning in the second grade. If there are limited funds, the reading 
test may be administered in the beginning of the third grade to all students. 
Future testing would focus on students experiencing difficulties in classroom 
reading. It is best, however, to measure all children's reading growth each 
year in order to gear the instructional program to individual needs. The 
reason for testing reading in the beginning rather than at the end of the aca- 
demic year is that children may go down or up in their reading levels over the 
summer. The teacher needs to know the level of student reading achievement 
at the beginning of the school year, not levels three to seven months old. 

Scholastic Aptitude Tests 

The scholastic aptitude test helps the teacher understand the range of 
possible student progress. It presents objective data to aid the teacher in 
understanding individual student achievement both in the classroom and on 
other standardized tests. 

The scholastic aptitude test should not be administered earlier than the 
end of the second grade or beginning of the third grade. In the first grade 
a reading readiness test is generally preferable to the scholastic aptitude test. 
The assignment of an IQ or “intelligence” score (even if we call it scholastic 
aptitude, not everyone has received the message) at the beginning of a child s 
school career may do a great deal of harm. Teachers are likely to be less rigid 
about a child’s maturational level of readiness than they are about his mental 
ability. Although we delay our testing of scholastic aptitude until the end of 
the second or third grade, we must remember that even at this point reliability 
is not very satisfactory. 

If a scholastic aptitude test is given at the end of the second grade, u'C 
want to administer another test at the end of the fourth or beginning of the 
fifth grade. If only one test of this type is to be given in elementary school, 
the fourth or fifth grade is preferable because of the greater reliability of 
scholastic aptitude tests at this level. If financial resources for testing are 
plentiful, scholastic aptitude can, of course, be more frequently measured, 
depending on the needs of the particular school. Additional testing will also 
serve to increase the reliability of results. 

Achievement Batteries 

A minimal program in grades 1 through 3 would involve measuring 
achievement in the basic skills. These include reading, arithmetic, language, 
and listening. If reading skills are adequately appraised by the achievement 
battery, the school could eliminate the separate reading test or use a different 
type of test such as an oral reading instrument. If monetary conditions 
allow, it would be best to begin these tests in the beginning of the second 
grade and continue testing in the fall of each year. If a school system can 
afford to administer the battery only once during the elementary program, 
the beginning of the fourth grade is best. This would enable the school to 



435 


PLANNING AND USING A TESTING PROGRAM 

plan their instructional program for the next three years, based on the 
academic achievement of the children. With these data, provisions can be 
made for accelerated or enriched proems as well as for remedial instruction 
for those children who reveal educational retardation. 

At the upper grades of elementary school the achievement batter>’ should 
include tests in content areas such as social studies and science. These more 
specialized areas should not be included until fifth or sixth grade. 


Specialized Tests 

The four types of tests presented so far fom. the core of a good tesl.ng 
program. IndLldual needs of schools and students, ““j 

test: Hotvever, the east majority of school ^Jf^Vrare administered only 
be administered routinely. 


Sexemh Thmsh MM Orade 

In the junior and senior high d«isto^^^^ that must be 

the increase in the number of facing the student at this let el are 

made. Among the most impor j v„ational goals; and nhelhcr 

what courses to take and when, edura prerious 

to go to college or learn a trade^ content Us. The specific 

SCHOUSTIC Aptitode Tests imporiance al this level of 

ifoiSyr sXS aptitude t«, nan he administered dun 



436 


A SCHOOL TESTING AND EVALUATION PROGRAM 

benveen seventh and twelfth grade, the most appropriate time is the eighth 
or ninth grade. If monetary resources are adequate, a scholastic aptitude test 
should be administered at the beginning of the seventh grade, at the end ol 
the eighth grade, and again at the beginning of the tenth grade. Those students 
who ate going on to college will probably take the PSAT in the eleventh 
grade and/or the SAT or ACT in the twelfth grade. 

Special Aptitude Tests 

Special aptitude tests can provide important information for students who 
are not planning to go on to college as well as for those who are planning to 
major in special areas in college. As you know from our pre\TOUS discussion 
of special aptitude tests, there are two types, (1) separate tests that m^ure 
specific aptitudes and (2) those which combine measures of different aptitudes 
into a single test battery, such as the Differential Aptitude Test battery 
(DAT). 

If a battery such as the DAT is given, the school may also use it as a 
scholastic aptitude test. In the ideal situation, however, it would be best to 
use the scholastic aptitude score from a batteiy' such as the DAT only as a 
check on other measures. 

Special aptitude tests have the most relevance for the student and school 
beginning with the ninth grade. A test battery such as the DAT is highly 
recommended. The battery should generally be administered during the 
time of year in which students will not be taking other tests and early enough 
in the year that results will be meaningful. Generally, the middle or later part 
of the first month of school is a good time. The ninth and eleventh grades 
seem to be the most propitious levels for administering special aptitude tests, 
because there is then enough time for the student and school to make decisions 
on course and vocational plans. The same tests should be given in the 
eleventh grade as were given in the ninth, using a different form in order to 
check estimates of aptitude and note changes in performances. 

Achiev'Ement TEsm 

Although the teacher’s classroom tests \%iU be the final word in estimating 
grades and school achie\-ement, the standardized achievement test can be of 
assistance in planning course selection and vocational goals. UTien achieve- 
ment tests in specific content areas such as social studies, English, science, 
and mathematics are used to facilitate educational decisions, they should be 
administered in the eighth and the tenth grades. 

If these tests are to be used for administrative purposes rather than 
guidance, the ninth and eleventh or twelfth grades are most appropriate. 
From data gathered at these times, the administration has a record of growth 
and achievement of the student body as a whole. It is of no help, however, 
for individual guidance to present students with these tests when it is too 
late to make many changes in their educational plans. 



PLANNING AND USING A TESTING PROGRAM 

Reading Tests . . , 

Reading is as important at the secondary level as tt is m the > 

attention for those children who nee p grades of junior 

be made. 

Interest Inventories dm to be used with 

Interest inventories provide of data can assist the 

aptitude and achievement test resu ts. ^ , /administer the interest 

student in his vocational pane, ns may be considered in 

inventnry is in the ninth geade sn that interest pa 
planning courses of study. 

Personality Inventories to all students is not rccom- 

The administration of po„ona!ity inventotiM, along ™th 

mended, eveept for research be administered onl 

Sinieal Uniques such « J^pon referral and isith the full 

by a qualified clinical or ^f/proft^^ 
knowledge and written permission 

Scheduling of Tests „„blishing objectives and 

The test committee has been h”-*. ’’.J^radministtation, Koiv some of 

sclTedng tests -d “PP/f *n1e "T 'IhSon "nhe 

SIJS .,„„,siead.„tagesnf.«^^^^ 

student. When ''“;:.'’„tnints. (2) The their 

testing is necessary /or students lose have 

During a long Adb; on the other hand Others 

"glS-diJa*^ 


438 

n 


A SCHOOL TESTING AND EVALUATION PROGRAM 

More time is available for the administration and scoring of tests 
and the analysis of results. End-of-year pressures can result m tests 
being filed away without being used. (5) Up-to-date test results can 
used as a basis for grouping students for differentiated work or special 
corrective instruction. Moreover, scores on survey Mts serve as a 
starting point for the use of supplementary diagnostic methods to 
determine specific retraining needs [p. 499]. 


Spring testing does not have as many advantages as fall testing. It does, how- 
ever, enable teachers and the guidance staff to base the programming of the 
following year on objective data. It is also a good time to adrninister achieve- 
ment tests, as the school administration is usually attempting to gauge its 
standing and progress at a national level. The administration may also be 
interested in data for research concerning the effectiveness of teaching. 

The choice of the time of year for testing is also dependent on local con- 
ditions and needs. If, for example, a school system is divided into an 
year elementary and four-year high school plan, the receiving high schoo 
will tvant to test eighth-graders in the early spring in order to make intelligent 
decisions regarding number and types of courses to offer. In addition, 
counselors will want to talk to students and parents concerning course selec- 
tions. This is especially true when homogeneous grouping is practiced. 

Scheduling tests is a complex task when the whole school is involved. 
In an all-school testing program at the senior high school level, classes should 
be discontinued during the hours of testing. The mornings are generally 
considered the best time for testing because the students are fresh and 
probably at their best. The afternoon classes can be held as usual, or all the 
classes can be scheduled with shortened time periods. 

The folloMng is a testing schedule for Brook High School. Classes at 
Brook resume at their regular times at 11:10 a.m. The tenth, eleventh, and 
twelfth grades arc similarly scheduled. 


Brook High School Testing Schedule 


i: '■ 

Ninth Grade : 


Date 

Time 

Ten 

mudent 

Sept. 23 

8:45-11:00 

DAT Form L (Booklet I) 

A-M 



(Booklet II) 

N-Z 

Sept. 24 

8:45-11:00 

Dj> 1T Form L (Booklet I) 

N-Z 



(Booklet 11} 

A-M 


'Po 5a\c monc)’ Brook High School divided the ninth grade into half by 
assigning students v.ho«e last name began with the letters A through M to 
one group and the rest of ihc students to the second group, 'I’hus only half 



PLANNING AND USING A TESTING PROGRAM 

after the second test fifteen extra minutes 

“thT:.. tesiing tvm be accomplish Wo“ 

Three hours should be the "^/“^fove, a 

level. At lower levels it is Consult the test manual 

days so as not to tire young c i g^fiool staff 

for methods of dividing the time P'“P J' , j jtig a good idea to reproduce 

andthestudentsknowwhentcsmBissAcduU^^^^^^^^^^^ 

a schedule and hand it out to all t«chem a d 

Tests should be . W' game or dance. Tests should 

tests before a holiday or ^e day befor a minimum of distraction, 
be scheduled at a time when there win 


Test Orientation 


LCSl 

General OrienMton far Family c„g „,ad a refresher course in 

Each year before testing is to '“y presented no sooner than 

Standardled testing. TMs ^'^s Wo 

two weeks, and no later than program should be “ 1“' 

The director of this in-sem ?.f j y testing, such as t 


H;2S.»SS;E?£S-.r'^i 

itor effort. (See Chap e o a ^ ^gg„,ia,g: 

he program should cmertn 

1 \Vliy the school is administering o™'*” „,gan. 
k&e tests m.nhx^au^^^^^^ 

,„d counselors available. -0 



440 


A SCHOOL TESTING AND EVALUATION PROGRAM 

tests. This is not a real problem if teachers are properly prepared. A special 
in-sen.'ice program for test administrators should be conducted after the 
general in-service program previously discussed- 

The selection of test administrators in a large system can be made by the 
department chairman. He should be sent a brief communication stressing 
the importance of testing and the x-ital role of the test administrator. A list 
of selection criteria should accompany the communication- These essential 
criteria are as follo\«: 

1. Reads fluently. 

2. Has good pronunciation. 

3. Follon’s directions scrupulously. 

4. Thinks on his or her feet. 

5. Communicates well with students. 

6. Has some interest in doing the job. 

The training or orientation of test administrators should encompass the 
following objectives; 

1. Complete understanding and familiarity wth the test and manual. 

2. Understanding of and appreciation for standardized test procedures. 

These objectives may be accomplished by having the test administrators 
take turns administering the tests to each other. If there are too many teachers 
to allow enough time for this approach, the director of the in-service program 
can administer two to four items of each test to the group. This %vill give them 
practice in what the test is all about. The director’s method of reading the 
directions and speech pattern can serve as the model for test administration. 
A full question and answer period should follow. 

In Chapter 3 we discussed proper test administration and scoring pro- 
cedures. ^'eral points are worth noting again. 

1. Test administrators must follow the directions verbatim. 

2. Questions should be answered within the context of the test 
directions. 

3. Two time pieces should be used— the wall clock and a ^vristwatch 
or, in tests with short time limits, a stop^vatch. 

4. Test administrators should circulate around the room after testing 
has begun to see vvhether there are any problems. 

5. Special care should be taken in scoring — always spot-check scores- 

Pretest Orientation for Students 

Students should be called together in an assemblylike program. The reasons 
for testing and what the results may mean for the students should be candidly 



planning and using a testing program 

presemed. This general overr-iew of testing may be conducted by one of the 

’’rh“"?d:::aLrsmay™ntm^^^^^^^^ 

learn to respond ‘“at explain bom to take a standardized 

ssr ksSs-z.. -a . 


Recording Test Scores 

There should be a test record sheet in “ to?Another copy 

In the actual he accurate, but they must “"'‘■y 

accuracy Not only ^^.^ral E'ancc A good test record 

rntThtuld^iSude^thr—^ata: 

1. Full name of *' 

1: S Koffir^sefu. and meaningful for aral>u,s. 

A Sample Testing Program hanics of dev eloping a tntmg 

Our 'rta' TirWlS'"”'''"/ 

praS' auggestions “_;’j ‘^J^X ®?'''”!r^"Ture"Kh«' 

pro^am of t'^‘= illusiraiion of an cndorsctncni of 


ractical suggestions Community Sc»«ool Corrora 

rogram of the Sou jHustranon of the endorsement of 

idiana, is pr^^n program in no way oon , j^j.. program. It 

'he reproduction of t ^ P construed «* ^pie and a tesimg 

le program, nor j j of manj • , program hi'c 

i5l=e:s£te=...-.- 

Figure43sbo»ulhea 




Fiffure 43. Chart showing the evolution of a testing program. (From Program oj 
Testing. South Bend Community School Corporation. Rcpnntcd by permission.) 


442 










PLANMNG AND USING A TESTING PnOGBAM 

Note the length of time taken. The Testing Program Study Commission' nas 
composed of ten people representing the foBotring school areas : guidance (4), 
school psychology (2), teaching (3), coordinator of testing (1). 

In addition, other school groups were consulted and their recommendations 
incorporated into the final product.® 


1 . TESTING PHILOSOPHY^ 

The test information is a portion of the school's total ■■'f''«-l 
nathering program for accumulating an objective record of a pjil 
L he nrogrcLs through school. This progress record contains 

working with test information o i student maximize 

(1) to use the informa- 

educational c.\pcriences. W ^ ^ of interpretation and 

to professional on a need-to now Community SJool 

•fhetoult^ting program of The fiorir 

Corporation is du'ided int / j Ttstmg. 

rJlhs noeram, Suppimenlal Tertn-., an p 

\. Basic Testing Prasrom am is to provide information 

The purpose of the basic , „als are: a college degree 

about students whose a high school diploma, 

: S?han a S sAocUipbrna. This additional information ptov, 

J less rnaiJ 6 Comniission: 

.Tbefo.,owh,gpe->.-.S 5 i^^ 

ss;i’sS%£Ws:TS^^ 

school counselors. i,,icn of Ihe |°5'V«:;nir. South Bend Community 
> Reprm'f'l.^Si (From 
South Bend, >"^“ 968.) 

Corporation. Sprmg. 



A SCHOOL TESTING AND EVALUATION PROGRAM 

about students by their own efforts is obtained, using standardized 
tests, only when that information cannot be obtained in any other 
\vzy or cannot be obtained as economically as through the use of tat^ 
The basic group testing program is more than a list of standardized 
tests and the dates for administering them. It includes: 

... an orientation for staff, students, and parents of the rationale 
for giving the test and the use that may be made of the results. 
... a strict adherence, by test administrators, to the testing con- 
ditions described by the test publisher in the testing manual 
provision for the child or class to be given the teacher s (or 
tester’s) undivided attention during testing period. 

. . . direction for careful observation of the students during testing 
to detect and report any significant behavior which might affect 
test results but which might not be evident in the test results. 

. . . provision for make-up testing of students absent during regular 
testing sessions, make-up testing for those whose test results 
or behavior seem inappropriate, as well as testing of students 
new to the school who do not have immediately available 
test results on file. 

. , . immediate and accurate test scoring, using electronic data 
processing where possible, to Insure a fast return to those using 
the test results. 

. . . reporting the results in a manner most useful to the staff, 
students and parents for making pre-school, in-school, and post- 
school derisions. 

... a means for correctly interpreting the results to those who 'vill 
use them, the most important user being the student. 

... a provision for permanently recording and storing all standard- 
ized test results for future reference. 

. . . provision to safeguard individuals’ rights through the pro- 
cedures used in transfering test results outside the school 
corporation. 

... a s}-stem for purchasing, storing, distributing, and collecting 
test materials economically. 

. . . provisions for a continuing evaluation of the total testing 
program, considering the concepts stated above, 
i 2. Supplemental Testing 

In addition to the basic testing program, optional group testing 
materials and services are presided for those staff members needing 
additional student information that can be prorided by testing. A 
list of approved tests asallable for use in supplemental testing as well 
as the procedures for requesting the tests are distributed to each 
school annually. The concepts for using supplemental testing are the 
same as those included above in the basic group testing program. 



■LANNING and using a testing pnOGRAM 


44S 


3 , Special Testing and Assessment _ • , j fiii 

Individualized testing and assessment is provided to fill gag n 

nose learning f students nerv to the school sjstem to 

et cetera; tests admmisterc ^ rMriincss for educational 

help determine their j i„ edultional research pro- 

tvith coliege-hound students! e. 

cetera. 

■^^r^tsamusedm.— 

‘•“^‘sscoso pnrpase 

'nerdatthatgirdelevelorminov^^^^^ 

Si5lg!^£'& 

later success. , he curriculum is 

of progress a, predtctm j l„rt,o„ „f the ^ 

During ‘ h child tam to „ „|,„e his 

directed ton-ard ''''P'"® ^ direction of an adulh . ^m. 

groups of children f'^stomed to not a 

vocabularj', and to ^ i^,|y insD ^ ,j. 

Test ^^["[““"“testiha P'-*™” I’V'-irLartmcnt. Test results 
ministered by the J 


October 1 b“t f 
they are socially. ' 


446 


A SCHOOL TESTING AND EVALUATION PROGRAM 

It also protects other children from entering kindergarten when 
they are not socially, emotionally, and mentally ready for this ex- 
perience. 

Group testing in kindergarten comes near the end of the year 
and involves a test of general mental ability as well as a measure of 
readiness to begin reading and number work in the first grade. 
These measurements of reading and number work readiness supple- 
ment the information given to the first grade teacher by the KlNDpt- 
GARTEN teacher in a conference for the purpose of indindualizing 
instruction. T/ze results of these measurements are never used as the sole 
criteria for KiSDERGARTES retention or for forming first grade groups. 

During the first, seco.nd, and third grades the major instruc- 
tional consideration is gi%'en to developing and practicing the skills 
necessary for reading and arithmetic. Because of the constant prac- 
tice and teacher obser\'ation of the development of skills of each 
child, the most appropriate standardized test at this time consists of 
diagnostic reading and arithmetic tests administered and scored by 
the classroom teacher at those times when such information is most 
useful. The usual time for this testing is when the instructional group 
has completed the first, second, or third grade reading programs 
and may not necessarily coincide wth the end of the year. There 
usually is a group within the classroom being tested at a time — 
probably an instructional group. WTien there are two or more instruc- 
tional groups, each group is tested at the time a grade level program 
is completed. The sldll development of each child observ'ed by the 
teacher and analj-zed with the use of diagnostic testing information 
is then systematically communicated to the ne.xt year’s teacher to 
pro\ide a valid basis for continuity in reading and arithmetic skills 
development. This test information like all other information gathered 
by the regular testing program is permanently recorded in the 
child’s record. 

Near the end of the third grade a measure of language arts and 
mathematics skills de\'elopment is administered to all children to 
provide a summary of primary grades skills development as well as a 
basis for the educational program at the beginning of the inter- 
mediate grades. 

In grade four, reading and arithmetic skill development is still 
an important part of the curriculum, but broad concepts have been 
introduced in the areas of science, social studies, language, and 
vocabulary' as well as the development of sound patterns of study' 
skills. Near the end of this grade the child’s first standardized measure 
of education^ progress is taken in each of these achievement areas. 
At the same time, a measure of ability is given to determine the general 
operating level of each student. A comparison between ability and 
achie^'ement is provided at this level to draw attention to those 



planning and using a testing program 

smdents for xvhom there U o significant difference t«'«en t«t 

summarized by broad concep students in her classroom 

teacher receives this and tveak- 

at the beginning of that schoo y individualized instruction. 

nesses of each child niay be focus p folloacd in 

The same pattern of "““"a.ion in l.cm-anallted 
^o^rbeing pto"v5ed teachers at the beginning of the aUetna.e 

^ttr^e of tested hS’hi^ 

NINTH grade to be used ■" ^P'"S f„„her clues in prepanng 

program of studies as veil as P™'' ^ summarited by school 
vocational objective. Th« „( cMtsc offerings needed in 

results, provides a basis ‘^“'^'^^ofgroupsofstudents. 

,heschooltodevelopthesp^^tak"«»>fi N.ND. grade 

Early in the seventh S'”'’' , a,eas is given each studen 

„elre of tested lnt.r«t m s.udenfs educa.iona 

,0 be used in fta'*" P’“""“Vas provide a basis for serious study 

and f^upational goa s a 

concerning suitable areas 

nr. test uesults and THsm use ^ 

requires sign'd Pa","“>^ l^e becomes a part 

pkUthertlm’Jo'”- 


• u* ttir classrooms the 
■ sTsr.he teacber in dUrororing. ,^ e a, a 

a. loassisi ^ranccofi 


“tienq"^" '“"''I” TXvemen. osTr a long P"-* 
d. To help reveal 



A SCHOOL TESTING AND EVALUATION PROGRAM 

students within subject areas, or discover special talents 
individual students display which might be considered in daily 
instruction. 

e. To aid in the identification of indi%'idual students who are in 
need of additional testing, specific program assignment, or 
other consideration. 


2. Counseling and guidance use: 

a. To interpret test results to individual students in order that 
they may gain a realistic self-concept in relation to their tested 
abilities, interests, achievements, and aptitudes, 
b- To interpret test results to parents and faculty in order that they 
may gain a realistic concept of the child’s tested abilities, 
interests, achievements, and aptitudes. 

c. To supply a developmental pattern of objective data for the 
cumulative record and subsequently for other schools and 
colleges. 

d. To assist in the optimum class placement of students. 


3. Administration (\sithin the building) use: 

•a. To provide a summary of information to permit judgment of 
the extent to which the total school is meeting its educational 
objectives. 

*b. To indicate strengths and weaknesses in the achievement of all 
students In the school in relation to over-all measures of student 
ability. 

c. To aid in determining the courses to be offered in the school. 

d. To ser>'e as an aid to grouping children. 

*Tests are never used as the only data to evaluate the effectiveness 
or ineffectiveness of teaching because of the questionable validity 
in using standardized tests vnthout consideration of other factors 
zckich contribute to student gr&isth. 

4. Administration (central office) use: 

a. To help in determining the effectiveness of the curriculum in 
achie\'ing the schools’ over-all objectives. 

b. To screen and interpret generalized test data that may be used 
in public relations programs. 

c. To provide information to other agencies such as juvenile court, 
guidance centers, colleges and universities, etc. 

d. To aid in providing information for suiv eys, follow-up studies, 
research studies, and other school-wide studies needed period- 
ically for school administration. 

e. To act as a clearing house for individuals and organizations that 
wish to evaluate or analj'ze data about or from students. 



planning and using a testing program 

IV MECHANICS OF TESTING 

The Guidance Department of the South Bend Community School 
CoTporation rrespoUle for the coordination of the total group 
rom The deoartincnt coordinates the program in 

=£=Hs'z^.'£r.r;S;.s 


I. Orientation to Testing , ^ ,5 j,,, „„e 

Each of the elementary, junior high tfd hjg^^ ^ 

person designated k a„d measurement training, he 

is present in the school and qujUSrf person 

may be appointed to .,s..h«,lprincipala5theschoors 

r;:;“ifon. “ “■ 

feeder areas to discuss the folloumg. 

1 What the test measures 

I- ^r^:ra.ioV:n"h! students and testing room for testing 
4; LogUtics of 

^■r„ra»Su.. 

7 . SdtTf US “g.h= t«t^«”;;;‘„ ,j™„is,er tests should take 
At least tsro sveeks^^pH- 

“slots tesSwtJtoR^^^^ to discuss the 

• —en are administcf^ 

2, Test 'f the basic «"!"? f,,%^dents- oun teacher 

Standardired t^ p„„p, bj 

to students in cla-«r^ 



A SCHOOL TESmNG AND EVALUATION PROGRAM 

(who has been prepared in a workshop session). Tests should be 
scheduled for testing on Tuesday, Wednesday, or Thursday. 

To aid in uniformitj’, tape recordings or the public address system 
may be used to give directions. 

To aid in determining any invalidity of scores due to the testing 
emiroxunent or examinees* behavior observed during testing, ^ 
administrator’s observation check list (See Figure B) is provided for 
use during testing. Any report of results which contains probable 
invalidity will have a notation stated, “May be invalid.” 

Make-up testing shall be conducted during the regular school day 
within a reasonable time after the child has returned to school. 

.-Vny new pupils entering the school without test data shall be 
administered the minimal tests which other pupils on his grade level 
have taken during the year. 

V. TIST SCORING AND RfPORTINC 

-All tests in grades four through twelve are machine scored directl) 
from the answer card provided for student use. Appropriate reports 
are prepared from the test results for use by teachers, counselors, 
administrators, coordinators, directors, or other certified employees. 
Examples of these reports are used in the testing workshop prior to 
actual testing. 

Notice that the test has been given is fonvarded to parents along 
with the report card immediately following the testing date en- 
couraging the parents to visit the school and discuss the test results 
in the context of all other information relevant to the progress the 
student displays in school. 

Each student in the junior and senior high schools is encouraged 
to discuss the present and past test results with his councelor as one 
means of increasing self-understanding and facilitating decision- 
making. 

All diagnostic tests in grades k through three and all supple- 
mental testing are scored in the building under the direction of the 
principal or counselor. The kindergarten measures of general 
mental ability* and reading and number readiness are hand-scored at 
Management Information Services. The itiird grade measure of 
language arts and mathematics skilb development is also hand-scored 
there. 

Press labels are prepared for everj* standardized test given and 
returned to the school along with any other report forms so that each 
student’s test results will be recorded in the cumulative record- This 
guideline does not include tests that accompany textbook series but 
does include the results of special testing which are recorded by 
hand in the cumulative record. .All test data on pupils from other 
school systems arc transferred to the cumulative record of the student 



PLANNING AND USING A TESTING PROGRAM 

in the cumulative record. j- results in the cumulative 

In addition to permanent y „„ ,bm 

record, the test results are •■'«> ,hst all or part of the 

cards at Management Informanon Seo « “ “ 
information may be recalled eas.ly at a later 

VI. TEST SCORE INTERPRETATION 

All standardised 8™“? .*“* f" toj.*«°and juvenile authonlies 

to students, parents, 'T are felerfreled by the teachers, 

on a need-to-knmv ''““ T^alinUtrators using grade equtvalent 

counselors, coordinators, and/or ato norm basis T/ie 

social, and emotional grotclh. 

vn. TEST MATERIAL ECRCHASmO. ^„,a, ,„,ing 

"’tdtl rpSion manuals) « ■> S' 

struction and mt r months ''hen p 

companies during gained for ^r' 1 or counselor 

advantage of bulk ra^ /'^ble to any .dJumnnl 

A list of s“PP'''"'"MT,s or grotP' for siTplrmental 

,vho tvishes to tesMnd j E«ry requ« j J|,p,„priaic 

elementary or sccon - r> 

supplement 

1. Requests ^r s^^^P Dell . 

Tn ” P”'*"- following rfitS" 

2. Tbe r''lTmTT«-.f»-.P-j' r'Tntrfnr n^ng 

making requ'*'' ,eqnested totm^ „( ,|,e lest 

to be tested, da i„f„rmati«n. use to oe 

Slrandnny-'*"””""'""^ 



Scltciliile of 'IVsts— !‘rimnr>’ School Level 


52 


A SCHOOL TESTING AND 




453 



Scimliilc of Tests— Junior Ilisli ami IIi'rIi School Level* 





455 


planning and using a IHSIING program 

vm. TISTINn PROGRAM LVAMATION 

A test review committee ^ concepts and testing 

,oTn The committee »s gum^a ) committee «s 

IllLd in the testing pMril,,!' three men, bets 


to: — ^ovUool or nv 

. ,hi, ,oi«n. ' J' 

Community ^ „, Imtn’t"*”:’" Z' ,,6ea coUtrK “ 

“"“JTthff nw 





456 





planning AND USING A TESTING PROGRAM 

1 . You should be 

orerr/o^uS - - ^ 

;:frs\idTS»Hho.4ischoouGivi^^^ 

during the test ;„dmdual questions quietly (to the m- 

4. You should respond standardized procedure. _ 

dividual only) in b^avior tvhile they are b''"8 '“I'J 

5. Keen observation of „s ate to be meaningful. T ou 

is of utmost importance if ‘“If nation of test scores nail, to a 
Luld bear in mind ."el "" 

large extent, depend 0"h0Ut»o.^^^^ 

aud recorded dur, ObseD.ion CbecbU:: 

;::';Srsr;«^j-s2ring testing Miriam 

The information to be reco. 

sectionst , jtudenW being 

,. .bat ivould influence the ouieome of all 
1 . A section that 

'“'imirruptions ,i 

‘:s=EigSK.'ia“’ 

Artministration. c»g ^i,,mns for placing 


4 , Teacher’s comments. 1 
condition 


,,S to Students tutiP"'"'^ 

Reporting Test Besu -.h the problem of hoiv best to convey 

^ l„«arcfaceil'Vt'';*a r j they may 

Teachers and «u^ parents. If ^ey anything, the 

test results to ,bc studeut. U * > llcidcs, many parents 

s;.-: =■ 

has a tight to kn 



45S A SCHOOL TESTING AND EVALUATION PROGRAM 

in discussing the problems of telling parents about test results, presents four 
philosophical premises that should form the basis of school policy: 

' 1. Parents are entitled to information related to their children’s 

I progress in school, especially as it relates to future educational 

i or vocational plans. 

’ 2. Test information given to parents must be expressed in undcr- 

J standable terms. 

1 3. Test results arc best revealed in terms of a simple scale broadly 

based. (Some common means of interpretation, such as intel- 
ligence quotients and grade equivalents, appear to be more 
precise than they really are.) 

4. The information should have uniform meaning to parent and 
1 educator and demonstrated rclev'ance (validity) for the purpose in 

[■ mind such as grouping, promotion, and guidance [p. I]. 

Specific IQ scores should never be given to parents. Grade equivalents or 
placement scores are not a satisfactory means of reporting, because mis- 
understanding often results from their use. (Sec Chapter 5.) Percentiles and 
stanines are most appropriate for conveying test results to students and 
parents. 

Kicks (1959) poses the question, “Are any numbers nccessarj'?” Although 
he would not impose a ban on using numbers in reporting test data, he notes 
that some ver)' good counselors do not present any numerical data at all. 
Verbal techniques such as, “You score like people who ...” or “Your son 
(or daughter) scores like students who ...” are an excellent way to com- 
municate information about test scores. 

This mode of test interpretation in an actual situation might take the 
following form: 



Counselor (or teacher); Maty, you score like people who have a 
difficult time in college. On the other hand, your scores reveal a great 
deal of promise in commercial areas such as filing and typing. You 
may want to ransider a secretarial school after high school graduation. 

Counselor (or teacher): Your son Wally scores like students who 
find the i\y league schools difficult but seem to manage well in smaller 
private four-year colleges. 

Counselor (or teacher); You score like students who do better in 
algebra if they take general mathematics in the ninth grade and 
algebra later. 


In conveying test data to students and parents, the teacher or counselor 
should be sure to emphasize that test scores are tentative and that their real 
meaning lies in using them with other information such as school grades, 



460 


A SCHOOL TESTING AND EVALUATION PROGRAM 


ability in autistic children can r^d the extensive writings of Dr. Bruno 
Bettelheim. In The Empty Fortress (Bettelheim, 1967), for example, the 
problem of determining the intelligence of autistic children is explored. 

The teacher and counselor must bridge the gap that separates them from 
students and parents in order to realize the full effectiveness of tests- This 
can only be done if a concentrated effort is made to present information in 
relevant and understandable terms. 


References 

Adams, G. S. Measurement and evaluation in education, psychology, and guidance. 

New York: Holt, Rinehart and Winston, 1964. 

Bettelheim, B. The empty fortress. New York; The Free Press, 1967. 

Durost, W, N. How to tell parents about standardised test results. Test Service Note- 
book, No. 26. New York; Harcourt, Brace and World, 1961. 

Lennon, R. T. Selection and provision of testing materials. Test Service Bulletin, 
No. 99. New York; Harcourt, Brace and World, 1962. 

Ricks, J. H. On telling parents about test results. Test Service Bulletin, No. 54. 

New York; The Psychological Corporation, 1959. 

South Bend Community School Corporation. Program of testing. South Bend, Ind. 
South Bend Community School Corporation, Spring, 1968. 



appendix 

A 

Major PoWisl-asof 

StandardtoilTM'*' 


^cwr—'— - 

(ACT) 

5. Box 16S 

,«a City, !o''» 5“'*° 

319-338-3671 

„erican Guidance sen-ice, loc- 

jblishers’ , 55014 

irckPiMSr Minnesota » 

612-786-4343 

-„eBoBb3-31erril.Co™P»>-.>- 

300 West 62 nd Stre 

:„dianapol.s nd.ana 

sn-AXi-Jio® 

. Measnteme'"' 
Bureau otEducW college 

3«-1200 


„„eau of Educational 

Service 

C-6 East Hall 

319-353-38-3 

C-H-^'n'SanACue 

-'ll’ ^“' llBno.s 60624 

sgrir- 

'■w- 774 -IH 3 ‘’ 


.343-1200 of chats®' 

■est catalogues are sent 


n request- 



462 


APPENDIX A 


Center for Psychological Service 
1835 Eye Street, N-W. 

Washington, D.C. 20006 
202*541-4465 

Committee on Diagnostic Reading 
Tests, Inc. 

MountainHome.NorthCaroHna 28758 
704-OXford 3-5223 

Consulting Psychologists Press, Inc. 
577 College Avenue 
Palo AIio, California 94306 
415-326-4448 

Education-Industry Senr’ice 
1225 East 60th Street 
Chicago, Illinois 60637 

Educational and Industrial Testing 
Service 
P.O. Dot 7234 
San Diego, California 92107 
714-488-1666 

Educatiorul Records Bureau 
21 Audubon Avenue 
New* York, New- York 10032 
212-L08-6700 

Educational Testing Ser\'icc 
Princeton, New Jersey 08540 
609-921-9000 

Western office: 

1947 Center Street 
Berkeley, California 94704 
415-849.0950 

Midisestern office: 

990 Grove Street 
Ei.-anston, Illinois 60201 
312-869-7700 

Follrti Publishing Company 
1010 \\cst ^\a3h^ngton Boule^'ard 
Chicago, Ilhnois 60007 
312-606.5858 


Grunc and Stratton, Inc. 

381 Park Avenue South 
New York, New' York 10016 
212-MU6-2077 

Guidance Testing Associates 
6516 Shirley Avenue 
Austin, Texas 78752 
512-GL2-6969 

Harcourt, Brace and World, Inc- 
757 Third Avenue 
New York, New’ York 10017 
212-572-5000 

Houghton Mifflin Company 
110 Tremont Street 
Boston, Massachusetts 02107 
617-423-5725 

Institute for Personality and Abilitj’ 
Testing (IPAT7 
1602 Coronado Drive 
Champaign, Illinois 61820 
217-352-4739 

Ohio Testing Services 
(Ohio Scholarship Tests) 

Division of Guidance and Testing 
State Department of Education 
751 Northwest Boulevard 
Columbus, Ohio 43212 
614-469-4590 

Personnel Press, Inc. 

20 Nassau Street 
Princeton, New Jersey' 08540 
609-924-7000 

Personnel Research Institute 
Western Reserve University 
CJo'eland, Ohio 44106 
216-CE1-7700 

The Psychological Corporation 
304 East 45lh Street 
New- York. New York 10017 
212-679-7070 



MAJOR PimUSHERS OF STANDARDIZED TESTS 
. Tiioh Sc 

Psychological Test Specialists 
Box 1441 

Missoula, Montana 59801 


Psychometric Affiliates 
Chicago Plaza 
Brookport, Illinois 62910 
312-233-5133 

Richardson, Bclloss-s, Henri’ and 

Company, Inc. 

355 Lexington Avonn' 

NervYork, New York 10017 
212-682-6300 

Scholastic Testing Service, Ine. 

480 Mejer Road 
Bcnscnville, Illinors 60106 
313-766-7150 

Science Research Associates, Inc. 

259 East Erie Street 

Chicago, Illinois 60ol i 
312-944-75S2 

Sheridan Psychological Servrees, Inc. 
P.O.Box 837 . 

Beverly Hills, Californra 90213 

213-474-1744 


State High School Testing Service for 

Indiana 

Room 109. A.E.S. Annex 

Purdue University 
Lafayette. Indiana 479U/ 


Teachers College Press 
Teachers College 

Columbia University 

Nctv York, New \ork 10027 
212-8704215 

Unive,sl.yl!ooks.ore,PnrdneHni«ersi.y 

360 Srate Street 

\Ve5tLata!ette,lndnina47A)0 

743-1288 

London E.C.4, England 

IVesremPsynholosW Services 
213-GR8-6730 



APPENDIX 



Glossary of Common 


Measurement Terms' 



academic aptitude {scholastic aptitude) A combination of inherited and acquired 
abilities needed for schoolwork. 

accomplishment quotient (AQ) The ratio of educational age to mental age; 

EA-r MA. (Also called achievement quotient.) 
achievement age The average age on an achievement test. If a child obtains an 
achio'ement age of 1 1 years 8 months on a reading test, his score is equal to those 
of children of 1 1 years 8 months, who on the average receive a similar score. 
achievement test A test that measures the amount of knowledge or skills that a child 
has learned in a particular subject field. 

age equivalent The age for which a given score is the real or estimated average score. 
age norms Values considered as average on a certain test for children of various 
ages. 

age-grade table A table showing the numberof students of various ages in each grade. 
alternate-form reliability The closeness of correspondence, or correlation, between 
results on alternate (equivalent or parallel) forms of a test; thus, a measure of 
the extent to which the two forms arc consistent or reliable in measuring whatever 
the)’ do measure, assuming that the examinees themselves do not change in the 
abilities measured between the two testings. 

* This Klossar>’ includes some of the common terms used in measurement. It ha* 
b«n repr^uced, with some rcsisions and additions, from Test Scrsice Notebook 
No. 13, with permission, published by Haieourt, Oracc & World, Inc. 

464 


glossary OF COMMON llEAStJKEMENT TERMS 

• knoitWee md proficiency with Irainmg. It 

The obility ,•» “1"'" " ’cqnhcd sUUs. 

a combination of inborn capacitj or ata y / ^ 

■“Hli'iSS;-:*,...-.-- 

“"which produces an averajN. ; ,„d„ey. The three most 

re«ral“am of the battety. „lationship, orjmOB 

Pearson, originator of Unless otherwise spe 

usually means tnc of relationship, to 10“. 

,00, denoting complete absene negative. ..-olete or fill ‘ 

Sretr''""'”;„,„e.ion insure for 

eo„ccn-on /or ^ '"“2,i„le.choiee q"'’™”;: “ "nUge g”®"’? ' 

students of h'g'' 3 high e“"'“'?:„(l„ence on the other. 

.ha.di.ide 

“■si::;sAre:s. o 

de:tr;rr-p'»'-“'’"^. isofnehtld'an^^^^^ 

4»orfS t«, A te. ,„UP, snch as atnden.s of a ^ven 

and that determines 'h;;; ^ ^ speofied gr 

difficully ^J'h/answer an item eo-e" 
age or grade, wiw 



466 


APPENDIX B 


discriminating p^^er The abiliU' of a test item to differentiate between persons 
possessing much of some trait and those possessing little. 
distractoT Any of the incorrect choices in a multiple-choice or matching item. 
distribute {frequency- distrSiute) A tabulatton of scores from high to low, or low to 
high, showing the number of individuab that obtain each score or fall in each 
score inte^^■^l. 

educateal age (EA) See aekieremtrA age. 

equivalent form Any of two or more forms of a test that are closely parallel with 
respect to the nature of the content and the diffiaxlt}’ of the items included, and 
that will }ield very similar a^-erage scores and measures of variabllit)’ for a given 
group. ^ ^ 

evaluate program Such a program involves the use of testing and nontesung 
instruments and techniques for the appraisal of g r o w t h adjustment and achieve- 
ment of the child. 

extrapolate A process of estimating values of a function bej'ond the range of 
a%'ailable data. 

factor In mental measurement, a hv'poihettcal trait, abUit}\ or component of ability, 
that underlies and influences performance on two or more tests, and hence 
causes scores on the tests to be correlated. TTie term “factor” strictly refers to 
a theoretical t*aiiable, derived by a process of factcrr analysis, from a table of 
intercorrelations among tests ; but it is also commonly used to denote the ps)'cho* 
logical interpretation ghen to the variable — i.e., the mental trait assumrf to be 
represented by the \'aiiable, as verbal ability, numerical ability’, etc. 
factor analysis A statistical technique for analvTing the relationship among a set of 
scores. 

foTced-chnee item Generally, any multiple-choice item in which the child is required 
to make a choice of anr.vers provided him. 
grade equivalent A score that indicates a child’s average performance In terms of 
grade and month. A grade equivalent of 7.2 is interpreted as the second month 
of the seventh grade. 

grade rxmm The average score that a child in a certain grade receives on a test. 
group teri A lest that can besdeninistered to one or more ihdivfduais at the same time 
by one examiner. 

individual test A lest that can be administered to only one individual at a time. 
intelligence quotient {IQ) The ratio of a child’s mental age (MA) to hb chronological 
age (C.\). The formula is IQ = MA/CA y. 100. 
interpolate In general, any process of estimating intermediate values between two 
known pomts. As applied to test norms, it refers to the procedure used in assigning 
interpreted values (c.g., grade or age equivalents) to scores between the successive 
average scores actually obtained in the standardization process. In reading norm 
tables, it ts neccssarv- at times to interpolate to obtain a norm value for a score 
between scores given in the table; e.g., in the table given here, an age value of 
12-5 would be assigned, by interpolation, to a score of 1 18. 


Score 

Age EquivaJent 

120 

12-6 

115 

12-4 

110 

12-2 



467 


glossary of common An^UREMENT TERA.S 

ktock-taking" of an mdnadoal s ^ „f ,es, „sed to mcasote 

i„ the usual sense. The term 

achievement status prior to '“'^”"f,“7„?app,aise an individuars status m 

£5;rXSS-.-si" 

i5£i^iiaSHi2SS 

SSsE:4SS?HsiS?5 

without the labor involved tn ■‘“^..Richardson formulas are n 
common In test ‘‘''■'‘‘’P"''?^c.|j,., „t speeded tests, 
prlate for estimating the rehabtldj 
meat. See ariUmede t»ea». c, ,cs, scores is dmded 

median The point at uhtch a g 8 ‘“''4h'e„ce test is average or 

parts. Half the scores W'^f\ieh a score on on ■"''“'f""' c,ual to a mental 
mental age (AM) of 60 on an intelhgen'e made by a 

normal. For onathP''; ■' * ‘7 m is the averaa 


I age (AM) The age ‘“7' “f 60 on an '"“"'®7thaM'0Uld be made by a 
normal. For example, if a scor^ „,„gc score that 
age of 7 years 9 mon hs, then w ^ ^ 

random group of children y frequently In a dis " ejjjidren on a 
mode The score that occurs ^ 85 88,90.94.96,95,99. 

smple, the rnode score s 

test- 10, 30, 35, 55, 55, ss, ^ examinee s 

maltiple-ehaUe Hem A test item °Ehc"' “> 

or best answer f""" "'Xf ' „f muluple-ebm’* item 

mnltiple-respmie item A sp' .01. ^ , „[ cases in a distn u 

of the given choices may be ” ....csent the number 

„„boleommony^ method uMe^sl^cre^f^^r:! 

and with cases co avera^* test perfo^^ . describe 

further one ‘‘^'‘‘h r'^l Tf etd ^ ’"r * age. ««■> 

nerrm A way of f^^'srflus ages -'■>'^7" rtormaneea- Grade, 

erroups of students ol above-averag I j.fferencc of 

objective test A test whether respon 

opinion among scorers as 



APPENDIX B 


468 

It is contrasted with a “subjective" _test-«.g., the usual essay examination to 
which different scorers may assign different scores, ratings, or grades. 
omnibus test A test (1) in which items measuring a variety of mental operations 
are all combined into a single sequence rather than being grouped together by 
type of operation, and (2) from which only a single score is derived, rather 
than separate scores for each operation or function. Omnibus tests make for 
simplicity of administration; one set of directions and one over-all time limit 
usually suffice. 

percentile A score in a group of scores below which falls the percentage of scores 
indicated by the given percentile. For example, the 25th percentile denotes the 
score below which 25 per cent of the scores fall. Thus the person obtaining a 
percentile rank of 25 is considered as equaling or surpassing 25 per cent of the 
group taking the same test. 

percentile rank The per cent of scores in a distribution equal to or lower than the 
score corresponding to the given rank. 

performance test In a way, every test may be considered a performance test. How- 
ever, pencil-and-paper or oral tests are not usually regarded as performance 
tests. Generally, a performance test requires the use and manipulation of 
physical objects and the use of physical and manual skills not restricted to oral 
and written answers. The important thing is that the test response is identical 
with the behavior about which information is desired. 
personality test A test that attempts to measure everything that constitutes a 
person’s mental, emotional, and psychological makeup, 
poaer lest A test that attempts to measure level of performance rather than a child s 
speed in answering questions. There is little, if any, emphasis on time. 
practice effect A term test people use in describing the influence of previous ex- 
perience with a test. For example, Johnny took the same test two months ago. 
Will his previous experience with this test help him achieve a higher score? 
profile A graphic portrait of a child's test results on several tests. 
prognostic technique A test used to predict future success or failure in a specific 
academic subject or field. 

projective test A method of testing to determine personality characteristics. The 
person is presented with a series of ink blots, pictures, unfinished sentences, 
and so on. The term projective is used because it is believed that a person 
‘ projects into his answers and statements liis own needs and feelings. 
quartile One of three points that divide the cases in a distribution into four equal 
groups. The lower quartile, or 25th percentile, sets off the lowest fourth of the 
group, the middle quartile is the same as the 50th percentile, or median; and 
the third quartile, or 75th percentile, marks off the highest fourth. 
random sample A sample of the people of a population made in such a way that 
c\ery person of the population has equal chance of being included. This method 
attempts to eliminate bias, 

range The extent of difference between the highest and lowest scores on a test. 
I-or example, 98 is the highest score and 10 is the lowest: therefore the range is 
between 10 and 98. 

rate score Usually the number of right answers, or some such formula as rights 
minus one-half wrongs, which gives a total score. 
readiness lest A test that measures the degree to which a child has achieved certain 
skills or information needed for undertaking some nCTv learning activity. For 



469 


glossary of common measurement terms 

instruction. . ^aiiunee to supply the correct answer from 

recall iirm An item that nith a i«, i" "ht* 

his ow-n memory or recollection, “Columbus discovered Arnenca 

he need only identify the wrtwt discovered America m (a) 

MilUy The extent to nhmh a cMd „hatett 

,0 be readministered. That B. is r of a test, 

orr'S't.'prreri; eorrected. 

iSmTOK The tendency of a dist ki„ betiveen the reliability 

the mean. formula gi'''"S *' of the reliability of a test 

in e?rdtn«riu me" halves of the test 


of an entire test from th questions 

uKaUBy). 0 child's performance by 

■'’t ^n^ns™Hn a c™rtmn 

split-hatj aefficiiB A oo'l^ ^ „,bcr half, ” n’.numbered items- 

standard dndallon (- 5 °) ^ " J^“„„„d the mean. ..„ror of m®"""®' „ 

The rnore tho "corj .h mag jde onh score 

Standard error (5 ) amount by ' • amount such ^ 

in a scori^that «■ ^tt,c candard aiffe. b)' “mibr 


two-thirds or ^ score. » »»' i , obtained score j ,o 

true score b> no standard error. r^rtvicd" scores, 

about trvo .birds o * ,he rj^sum. ..,„„srorme 

„rnrofascore.th«" efemngtoam ,„=»». of 

.rorrdurdreore Age^c scores .»y b' “f 
comparabiiity* 



470 


APPENDIX B 


The simplest type of standard score is that which expresses the de^^tion 
of an individual’s raw score from the average score of his group in relation to 
the standard dniation of the scores of the group. Thus; 

. , ^ raw score (X) — mean (M) 

Standard score (z) = ■ . — p-j — r-y. ,c:r\\ 

' ^ standard des'iation (SD) 

By multiplying this ratio by a suitable constant and by adding or subtracting 
another constant, standard scores having any dcsir^ mean and standard 
deviation may be obtained. Such standard scores do not affect the rclaiHe 
standing of the indhiduals in the group nor change the shape of the original 
distribution. 

More complicated tj'pes of standard scores may jncld distributions diffenng 
in shape from the original distribution; in fact, the}* arc sometimes used for 
precisely this purpose. SenmaUzed standard scores and K-scores (as used in 
Stanford Achie\'cmmt Test) are examples of this latter group. 
standardized test A test that lias definite directions for administering, scoring, 
and use. 

rtamnes A nine-point scale. It divides the norm population into nine groups. The 
mean score from I to 9 is 5- 

svTtey test A test that measures general achievement in a given subject or area, 
usually with the connotation that the test ts intended to measure group status, 
rather than to yield precise measures of individuals. 
test‘Tetesl coefficient A t>*pc of reliabilitj' coefficient obtained by administering the 
same test a second time after a short interval and correlating the two sets of scores. 
inte-JaUe item A test question or exercise in which the examinee’s task is to indicate 
whether a given statement is true or false. 
true score A score entirely free of errors of measurement. True scores are hj^p^^" 
thetical values never obtained by testing, which always involves some measure- 
ment error. A true score b sometimes defined as the average score of an infinite 
senes of measurements with the same or exactly equivalent tests, assuming no 
practice effect or change in the examinee during the testings. 
raUdily A term used to designate the extent to which a test measures what it is 
supposed to measure. For example, b the reading test measuring Bill’s reading 
ability or hb knowledge of science? 


References for More Extensive Glossaries 

Engibh, H. B. and Englbh, A. C. comprehensive dictionary of psychological and 
psychoanalytical terms. New York: Longmans, Green, 1958. 

Warren, H. D. (ed.) Dictionary of psychology. Boston: Houghton Mifflin, 1934. 



APPENDIX 


Representative Tests 




This appendix includes "P'J^’projSire devices which r'q""' 
and school counselor Tests su ^ 


aiiu avuuui wu.ww— - - tn„U ttot COnStfUC thC 

training are not listed. Mhaustivc. The reader j . . — -ettheornissio" 

This test list is by no not should be mterp"' 

l“"SofVesUomeanapptoval ..a 


:rzrhrdTsirt-i; p„„tshe.. f”: 

The names of tests, grade rang ,P poblication infonnation ) 

(MMY) volume and page citat. jiscossion of so c 

reteren e research. (See Chapte^ “ oddr.sscs of pu 

The reader should refer to Append.* I' 


and A'f 


472 


appendix C 


Nome of Test 


Grade Remgt* 
(or 


lyniVtDVAL I.NTELLICENCE TESTS 

Crncll Infant Intelligence 3-30 mos. Psychological Cotporation 6-515 

Scale 

Pull Range Picture 2yrs. &:up Ps 3 'chQlogical Test 6-521 

Vocabular)' Test Specialists 

Minnesota Preschool 1-4) yrs. American Guidance Service 6-528 

Scale (EducationalTestBureau) 

Peabody Picture 2-18 yrs. American Guidance Service 6-530 


Tull Range Picture 
Vocabular)' Test 
Minnesota Preschool 
Scale 

Peabody Picture 
Vocabulary Test 
Pictorial Test of 
Intelligence 
Portcus Maze Test 
Quick Test 


3-8 yt%. Houghton Mifflin 


Portcus Slaze Test 3 yrs. Sc up Paj-chological Corporation 6-532 

Quick Test 2 yrs. & up Psychological Test 6-534 

Specialists 

Stanford-Hind Intelligence 2 yrs. S: up Houghton Mifflin 6-536 

Scale 

Van Alstync Picture 2-7 yrs. Harcourt, Brace & World 6-537 

Vocabular)' Test 

Wcchslcr Adult 10 yrs. Sr up Psychological Corporation 6-538 

Intelligence Scale (WAIS) 

Wcchslcr Intelligence Scale 5-15 yrs. Psychological Corporation 6-540 
for Children (WISC) 

Weclisler Preschool and 4-6 yrs. Psychological Corporation 1967 

Primar>‘ Scale of 
Intelligence (WPPSI) 

DTVtLontENT fC.MJS 


Arthur Point Scale of 
Performance Tests: 

4 yrs. & up 

4-335 

Term I 

Stoeliing 


Pomi 11 

Psjxhological Corporation 


Hatley- Infant Scales of 

Birth to Psychological Corporation 

1963 

of Do'clopment 

15 mos. 


Ct-K-ll Detclopmenfal 



4 uks-6 >rs. Psychological Corporation 

6-522 


.^fTtrinn thst RMrrwirs 


Academic Promise Test* 6-9 

(Am 

DifTcrcntial Aptitude Tots R-13, A 

(i).vn 


Psjcbological Corporation 6-766 
Psjchological Corporation 6-767 


• S\.(p iKii r.uff.Wr^ io<ite4»e KT»d«- ranr*- •hil* »sT i» ipccificalK dcsicmaitd by ycar» o 
irx- Ic'irf .\ lAdult). 



representative tests 


473 


Grade Range 

Name of Test 


Publisher 


MMY 
(or date) 


Flanagan Aptitude Classifi- 
cation Tests (FACT) 
General Aptitude Test 
Battery (GATE) 
Multiple Aptitude Tests, 
1959 Edition 
SRA Primary Mental 
Abilities, Revised 


9-12, A 

16 yrs. & up 
6:9-12, A 
7-13 

k-12 


Science Research Associates 


U.S. Employment Service 

California Test Bureau 

Science Research Associates 


6-770 

6-771 

6-776 

6-7S0 


SCHOLASTIC APTITUDE TESTS 
Academic Ability Test 

American College Testing 
Program Examination 
California Test of Mental 
Maturity, 1963 Revision 
College Entrance Examina- 
tion Board Scholastic 
Aptitude Test (SAT) 
College Qualification Tests 
Cooperative Primary Tests 

Cooperative School and 
College Ability Tests 
(SCAT) (also Series 11, 
1968) 

Culture Fair Intelligence 
Test 

Goodenough-Harris 
Drawing Test 
Graduate Record 

Examinations (GRE) 
Henmon-Nelson Tests of 
Mental Ability, Revised 
Edition 

Kuhlmann-Andcrson 

Intelligence Tests, 
Seventh Edition 

Lorge-Tliorndike 

Intelligence Tests 
(multilevel edition) 
Miller Analogies Tc^t 
Ohio State University 
Psychological Test, 
Forms 21 and 23 


12 & Jr. 

Coll. 
4-16, A 

11 &«P 


U&up 

1-3 

4-16 

4-13 yrs- & 
10-16, A 
3-15 yrs. 

16-17 

3-17 

k -!2 

k-12 


Grad. Sch. 
9-16. A 


EduMlioMlTestoEScnicc 1963 

(Cooperative Test Division) 

American College Testing 6-1 

Program , 

California Test Bureau 

Edua<ion,IT«tinEScm« 6-«’ 

(forCEEB) 

PsycholOBicl^Corporation MS® 
Educational Testing Ser%ncc j 

^CooPcrativeTestDivis'O ) 

,„s.Uu.eotP«.on»r..y."J WS3 

Educational Testing Service 

6-462 

Houghton 

6-466 


Personnel Press, Inc. 

Houghton 

ESS""'- 


(6-467) 

1964 



474 


APPENDIX C 


Name of Test 

Grade Range 
(or life) 

Publisher 

MMY 
(or date) 

Otis-Lennon Mental 
Ability Test 

1-16 

Harcourt, Brace & World 

(6-481) 

1967- 

1968 

Pintner General Ability 
Tests — ^Revised 

k-I2 & up 

Harcourt, Brace & World 

(5-368) 

1966- 

1968 

Progressive Matrices 

S yrs. & up 

H. K. Lewis & Co., Ltd. 
(U.S. distributor: 
Psychological CorporatioUj 

6-490 

1 

SRA Tests of Educational 
Ability, 1962 Edition 
(TEA) 

4-12 

Science Research Associates 

6-495 

Tests of General Ability 
(TOGA) 

k-12 

Science Research Associates 

6-496 


SPECIAL APTITUDE TESTS 
Mathmatici : 

Lee Test of Geometric Hi. Sch. 

Aptitude, 1963 Revision 
OrieanS'Hanna Algebra Hi. Sch. 

Prognosis Test 

Orleans-Hanna Geometrj' Hi. Sch. 
Prognosis Test 

Foreign languages 

Modem Language 9-16 A 

Aptitude Test ’ 

Modem Language 3-6 

Aptitude Test — 

Elementary 

Pimsleur Language 6-12 

Aptitude Battery 

Mechanical aptitude tests: 

Revised Minnesota Paper 9-16, A 
Form Board Test 

SRA Mechanical Aptitudes 9-12 A 
Test of Mechanical 9&up 

Comprehension (Bennett) 

Motor tests: 

Purdue Pegboard 9_1 6, A 

Stromberg Dexterity Test A 

Clerical aptitude tests: 

Minnesota Clerical Test 8-12. A 


California Test Bureau 

6-647 

Harcourt, Brace & World 

Harcourt, Brace & World 

(4-396) 

1968 

(4-427) 

1968 

Psychological Corporation 

6-357 

Psychological Corporation 

1967 

Harcourt, Brace & IVorld 

1966 

Psychological Corporation 

6-1092 

Science Research Associates 
Psychological Corporation 

4-764 

6-1094 

Science Research Associates 
Psychological Corporation 

6-1081 

4-755 


Psychological Corporation 6-1040 



representative tests 


Grade 

JVame 0/ Te%t 


Publisher 


iW.ur 

(or daie) 


12-16, A Stoclting 


Short Employmerit Tests 
Short Tests of Clerical 
Ability 

Artistic aptitude tests : 

Horn Art Aptitude 
Inventory 

Meier Art Tests: ^ 

1. Art Judgment 

2. Aesthetic Perception 
Musical aptitude tests: 

Musical Aptitude Profile 

Seashore Measures of » 

Musical Talents. Revised 
Edition „ _ .m 

wing Standardized Testa 8 yta- P 

of Musical Intelligence 


PsychologialCotrotalion 

Sdence Research Associates HOtt 


5- Z42 

6- 346 


ELEMENTARY ACHIEVEMENT TEST BATTER 

Iowa Testa of Basic Skills 3-9 

Metropolitan Achievement i-i‘ 

Test , j Q 

SRA Achievement Senes 
Sequential Tests of 
Educational Progress 

(STEP) ,.9 

Stanford Achievement 
Test, 1964 Revision 

...cnaciooL achievement B'v™® 
Iona Tests of Educational 9-12 
Development , j2 

Metropolitan Achievement I 

^ f 4-14 

Sequential Test ol 
Educational Progr 

(STEP) - 1-9 

Stanford Achievement 

Test: High School 

j* or «e< r*nfi^' 

• No specifie pr*! 


Bureau of Educational 
Research and Service, 

University of lott-a 

Houghton Mifilln /Lttt 

PsycLlogical Corporation 6-353 

National Poundatienfoj 

Educational Bcsi^'h m 
England and \Val«. ‘Ii' 

Mere, Upton Park, 

Slough, Bocks, England 

6-13 

til 

HarcouN, Braces 'Vot'd 

science ReseamhAss^ate, 6-14 

Hareourt, Brace A 

•rr-^iific Service 6* 

Education 

(Cooj^rt"'' I** 

„!c'VotlJ 19'“' 

Harcouit, 



476 


APPENDIX C 


A^ome of Test Publisher 


MMY 
(or date) 


SPECIAL ACHIEV'EMEKT TESTS 


Blyth Second-Year 

Algebra Test — Revised 
Edition 

Hi. Sch. 

Cooperative Science Tests 

6- 9 

7- 9 

Hi. Sch. 
Hi. Sch. 
Hi. Sch. 

Cummings World History 
Test 

9-13 

Dunning-Abelcs Ph 3 ‘sjcs 
Test 

11-13 

MLA Cooperative Foreign 
Language Tests 

Hi. Sch. 

Modem Math 
Understanding Test 

4-9 

Nelson Biology Test- 
Revised Edition 

R£.AD1.\C TESTS 

9-13 

Davis Reading Tests 

8-13 

Diagnostic Reading Scales 

1-8 

Diagnostic Reading Tests 

k-I3 

Durrell Analysis of 

Reading Difficulty, 

Nevv Edition 

1-6 

Gates-McKillop Reading 
Diagnostic Tests 

2-6 

Nelson-Denny Reading 

9-16. A 

Stanford Diagnostic 
Reading Test 

2-8 


INTEREST INATNTORIES 

Brainard Occupational 8-12 A 

Preference Inventory 
Gordon Occupational Hi. Sch, 

Check List 


Harcourt, Brace & World 

(5-443) 

1966 

Educational Testing Service 

6-867a, 

(Cooperative Test 

6.872a, 

Division) 

6-887a, 

6-909a. 

6-931a 

Harcourt, Brace & World 

(5-817) 

1966 

Harcourt, Brace & World 

(5-753) 

1967 

Educational Testing Service 

6-378, 

(Cooperative Test 

6-392, 

Division) 

6-402, 

6-416, 

6-426 

Science Research Associates 

1966 

Harcourt, Brace Sc \l'orM 

(5-728) 

1965 


Psychological Corp. 

California Test Bureau 6-821 

Committee on Diagnostic 6-823 

Reading Tests, Inc., 

Harcourt, Brace & World 5-660 

Teachers College Press 6-82+ 

Houghton tSIifflin 6-800 

Harcourt, Brace & World 1967 


Psychological Corporation 1968 
Harcourt, Brace & World 6-1056 



representative tests 


477 


Grade Range 

Name of Test (jge) 


MMY 

Publisher (of date) 


GuUford-Zimmerman 
Interest Inventory 
Holland Vocational 
Preference Inventory 
Kuder General Interest 
Survey 

Kuder Occupational 
Interest Survey 

Kuder Preference 
Record— Personal 
Kuder Preference 

Record— Vocational 

Minnesota Vocational 
Interest Inventory 
Strong Vocational 
Interest Blank (SVIB) 


10-16, A 

Sheridan Psychological 

6-1057 

Colleges & 

Services 

Consulting Psychologists 

6-ns 

A 

6-12 

Press 

Science Research Associates 

6-106U 

9-16, A 

Science Research Associates 

(6-1062) 

196+- 



1966 

9-16. A 

Science Research Associates 

6-132 

9-16. A 

Science Research A^ociates 

6-1063 

Hi. Sch. & 

Psychological Corporation 

1965 

A 

17 yrs. & 
up 

Stanford University Pres*. 

(f>-l070. 

6.1071) 

1966, 

1968 


PERSONALITY AND ATTITUDE iWENTOB 
Adjustment Inventory ^ 
California Psychological 1 3 yre- & 

Inventory (CPI) i2-i6. A 

California Test of 

Personality College & 

Edwards Personal . 

Preference Schedule 

(EPPS) 

Eysenck Personality 
Inventory 

Gordon Personal 
Inventory 

Gordon Personal Frofilo 

Gmirord-Zimmcman 

Temperament Sort . 

IPAT Childrens 

Pcrsonalni' U"'' . 16 

Minnesota Molt.ph«'= 

Personality InventoO 

Minnesota 

Attitude invent”'!' 


Hi- Sc't- 
College. 
A 

8- 16. A 

9- 16. A 
12-16. A 

8-12}^ 


iyrs.& 

up 

12-17 


Consulting Psyehologists 

CoSing rsjehnlnP"’ 
CaXniaTe" 

Psyehologiol Corporation 


6-S9 

6-71 

6-73 

6-87 


: 3 <ionaland Industrial 
sting Ser^nce 

ourt. Brace S: World 

»un. Brace 5 : World 

idan Psychological 

nices 

tutc for Personality 
d Ability Toting 
hologica! Corroratio" 

hologiol Corporation 


6-93 

6-102 

fp.103 

6-110 

6-122 

6-H3 

6-099 



478 


APPENDIX C 


Name of Test 

Grade Range 
(or age) 

Publisher 

AfMY 
(or dale) 

Mooney Problem Check 
List 

7-16, A 

Psychological Corporation 

6-145 

Personality Inventor)* 
(Bernreuter) 

9-16 

Consulting Psychologists 
Press 

6-157 

Sixteen Personality 

15 yrs. & 

Institute for Personality 

6-174 

Faaor Questionnaire 

up 

and Ability Testing 


Study of Values 

13 & up 

Houghton ^itf^^in 

6-182 

Thorndike Dimensions 
of Personalitj' 

11 Sc up 

Psychological Corporation 

1963, 

1966 


INDEX 


Achievement test batteries, 434 
arithmetic, ‘2^2-1^ 
content areas, 277-78 
language skills, 273-75 
reading, 270-72 

study skills, 275-77 77/^77 

SRA Achievement Series, 27 
V3. survey achievement tests, 2 
word meaning, 268. 270 
Achievement tests, 31, . ^ 

(„en;»Teacher.m»<iet«w) 

compared to. ptitude tests, 25b-S- 

construction of, 257-65 
content outline, 258-60 
reasons and objectives, W 
sample adminlstration,^-6^j_^5 

atandatdiaation ' 283-86 

diagnostic acbicvcment tests, 
arithmetic, 285-86 
reading, 283-84 
use of, 283 ^u-r-madc 

differences beween 37I- 

Standardized tests. 

72, 372-73/. batteries 

^'""^"fese.nen. 

terics) 267/. 

scope and conte . - 257-94 

major uses in the school. 

grouping. 29--^ j for intef' 

identification of 

,cTeb.|3;’' .mrla"'>'"f 

tcacbet aids m 1 
290-91 


nndersrandinE rbe student B6-83 

from, 279*82 

sa. acbievtmmr test baiimcs, . 

»bcn to use, 278-84 

types ot, 267-86 

act (Jtf American t,oiir, 

Prostam) p.jcbolopcal 

basic difTcrenee between 
main Test. 315 

^coi'or. and 

Social Studies RcadieR ,.s_i9 

i::“en.rtof.l«. 

testincdato,31. ^ A,- 

Ametiean Wucane^lJ 

soc“t«>n,6- ' \ ^ ;j„,„ Af 

AmerieanP.n.onneUnd‘- 

■“”,Cbotiieali^''«^‘’‘"' • 



482 


INDEX 


Anxiety (Cont’d) 
effects of, on test scores, 80-82 
questionnaire to ascertain, 81 
Aptitude tests, 219-55 

compared to achievement tests, 256- 
57 

function of, 228, 230 
graduate school tests, 251 
scholastic {see Scholastic aptitude 
tests) 

usings results of, 252-54 
vocational {see Vocational aptitude 
tests) 

Arithmetic achiev’ement tests, 272-73, 
285-86 

Arithmetic average {see Mean) 

Army Alpha and Beta Tests, 42-43, 192 
Army General Classification Tests, 146 
147 

Arthur Performance Scale, 189-90 
Artistic aptitude tests. 31, 251 
Attitude inventories, 339-44 
Ailport-Vemon-Lindzey Study of 
Values, 341, 343-44 
Minnesota Teacher Attitude Inven- 
tory, 339 

Survey of Srudy Habits and Attitudes, 
339-340 

Averages {see Measures of central ten- 
dency) 

Bender-Gestalt Test, 213, 214 fig. 
Bennett Test of Mechanical Compre- 
hension, 234-35 
Binet, Alfred, 60, 138, 170, 201 
Binet intelligence scales {see Stanford- 
Binet Intelligence Scale) 

Black, H., 32, 33 
Blacky Pictures Test, 209-210 
Buswell-John Diagnostic Test for Fun- 
damental Processes in Arith- 
metic, 285-86 

I 

California First-Year Mental Scale, 
194, 195 

Cattell Culture Free Intelligence Test, 
192 

Cattell Infant Intelligence Scale, 194, 
195 


Central icndcnc)*, measures of {see 
Measures of central icndcnc>') 
Cheating, 86 

Children's Apperception Test (CAT), 

210 

Class interval, 124-26 
CoefTicient of reliability, 107-109, 111- 
15 

height of, 116 

standard error of measurement, 109- 

112 , 112 /. 

Coefficient of validity, 99 
standard error of estimate, 99-100 
College Board Achievement Tests, 308- 
14 

English composition, 309-12 
foreign languages, 312 
histoiy, 312 
list of tests, 308-309 
mathematics, 313 
science, 313 
scores on, 309 
imcrpretaiion of, 313-14 
College Classification Tests, 297 
College Entrance Examination Board 
(CEEB), 62, 83, 147, 296, 297- 
JI4 

achievement tests {see College Board 
Achievement Tests) 
history of, 297-98 

PSAT {see Preliminary Scholastic 
Aptitude Test) 

SAT {see Scholastic Aptitude Test) 
test dates and schedule, 300 
testing and service centers, 298, 299 
fig. 

College entrance examinations, 296-323 
American College Testing Program 
{see American College Testing 
Program) 

college-administered tests, 297 
College Entrance Examination Board 
{see College Entrance Examina- 
tion Board) 

practical meaning of, 320-22 
recent studies on, 319-20 
College Qualification Tests (CQT), 297 
Concurrent validity, 9ln. 

Construct validity, I0I-I03 



483 


INDEX 

Construct validity 

techniques and procedures used in 
determination of, 102 
%vhat to look for in test manual. 
103 

Content validity, 92-93, 103 
what to look for in test manual. 
Cooperative Primary Tests, 258 
reading test of, 2/0, 271 fiS' 
Correlation, 96-97, 135-36 
linear, 136 , 

Correlation coefficient, 99, 107, 136 
CQT (ree College 

Criterion-related validity, 94-100, IIU 
methods used in estahliahment of 
correlation procedures, 96- « 
expectancy table, 9t-96, 9o fl- 
961., 97/. „ 

vhat to look tor in test manual, 100 
Criticisms of testing, IW4, 59 
test items and specific issi®- “ " 
multiple-choice tests, 32-34 
testing and the individual, 20 ;U 
abridgement ot human choice, u 

23 

damaged self-esteem, 20-22 

limited evaluation, 22 

testing and institutions, 23-iia 

testing and society, 

cultural bias. 2fr-29 

too much testmg, 

"Cultural bias, 26-29 
criticism of. in IQ ^ 


Doren Diagnostic Reading Test, 258- 
60, 283-S4 

Draw-A-PeRon Test (DAP), 211-12 

Educational Testmg Service. 33-34. 
251. 297 

Empirical or statistial valid.tj'. 9^. 
Enrironment vs, heredity, IQ and, 37- 

ta/iiy »/ 

(HEW), 4445 

Essay test, tcacher-made. 

advantages and disadvantages of. 389 

90 

■■esse history” of, 394-95 

compered to Ob, eetnMest. 383-S4 

construction of, 390-9 

mliability of, 386-88 
■■hJ^— ttJcholegU^^ 

96,9S)ig-.S6h.Wh 
Fechner,G.T., 7"- 33 , 

^ ^rFACT), 236n. „,a_8;0 

. ' aotitt 


criticism of, in lU tests, 2^27 ‘‘i^^ie^apthude «*■ 7^?;? 

'Sg- K^lTSibl^. >7«- 

Test, 192 

Progressive Matrices, IVi 
Curricular validity, 91n. 


iagnostic achievement 'S“gnoltic 
Achievement test . 
achievement 


General Clerical Test 

Gesell, Arnold, W 193- 

»5 _ 


Oevclopmentai 

Vchievement tests) n totlwtfS ChiW 

ifferential Aptitude Gese ,j readiness ’cement 

236-40. 436 . as , common measure 

excerpts from 'Jon' ?,iffetential terms, 464-7» , 190, 

Tesl” 24(MS . Coodenough 

Aptitude Tests, educational •, 

isadvantaged (the) 

reality, 47-52 



4S4 


INDEX 


Grading 

assignment of report card grades, 
422-24 

classroom participation, 423 
combining data from different 
sources, 423-24 

examinations and quizzes, 422-23 
papers and reports, 423 
assignment of test grades, 418-22 
grading on the curve, 419-21 
percentages, 419 
essay tests, 386-88, 393-94 
formulation of philosophy and ap- 
proach to, 417-18 
purpose of grades, 418 
Graduate Record Examination (GRE), 
251 

Graduate school aptitude tests, 251 
Graphic representation, 126-28 
frequency polygon, 126-27, 128/^. 
histogram, 126, 127 fig. 

Gray’s Oral Reading Paragraphs, 283 
Group standardized testing {see also 
Standardized tests) 
achievement tests {see Achievement 
tests) 

aptitude tests {see Aptitude tests; 

Scholastic aptitude tests) 
attitude inventories {see Attitude in- 
ventories) 

college entrance examinations {see 
College entrance examinations) 
interest inventories {see Interest in- 
ventories) 

personality inventories {see Personality 
inventories) 

Heredity vs. environment, IQ and, 37- 
41 

High School Placement Test, reading 
lest of, 270-72 

High schools, testing programs of 
comparison of two schools, 28-29/. 
suggestions for, 435-37 
Histogram, 126, \27 fig. 

Hoffmann, Banesh, 32-33 
Horn Art Aptitude Inventory, 251 
House-Tree-Person Test (HTP), 212- 
13 


IBM answer sheet, 75, 77 fig. 

Individual tests of intelligence, 169- 
97 

culture-fair tests, 31, 190, 192-93 
infant and preschool {see Infant and 
preschool tests) 
nonlanguagc, 188-90 

Arthur Performance Scale, 189-90 
Goodenough Draw-A-Man Test, 
190, 191 fig. 

Nebraska Test of Learning Apti- 
tude, 190 

Pinter Non-Language Test, 190 
Stanford-Binet Intelligence Scale {see 

Stanford-BinctlnteUigence Scale) 

Wechsler Scales {see Wcchsler Scales) 
Individual tests of intelligence and per- 
sonality, 169-216 

individual tests of intelligence, 169— 
97 

projective techniques {see Projective 
techniques) 

Infant and preschool tests, 193-95 
California First-Year Mental Scale, 
194, 195 

Cattell Infant Intelligence Scale, 194, 
195 

Gesell Dc>'elopmental Schedules, 

193-94, 194 yi;?.. 195 
Kuhlmann-Binet, 194 
Mcrrill-Palmer Scale of Mental Tests, 

193 

Minnesota Preschool Scale, 195 
Wechsler Preschool and Priroar) 
Scale Intelligence, 178, 187, 195 
Ink blots {see Rorschach Test) 
Instructional objectives, 375-79 
general, 375-76 
specific, 376 

Intelligence, definitions of, 34-35 
Intelligence tests, 36 {see also Indi'idual 
tests of intelligence) 

Army Alpha and Beta, 42-43, 192 
criticism of, 21, 26-27 
disadvantaged (the) and, 47-52 
elimination of, in New York City 
schools, 26 

histoiy of, 60-62, 170-71, 176-78 
major problem area in testing, 34-36 



INDEX 


485 


instrurtions and sample problems, 
3S3 fig. 

self-scoring answer sheet, 75, 76 
fig; 352 

Kuder-Richardson "formula 20," 108n., 
117, 301. 302 
Kuhlmann-Binct, 194 
KuWmann-Finch Scholastic Aptitude 
Tests, 224, 22S fig. 


Intelligence tests f'Con/y^ 

reascarch findings of, concerning racial 
and socioeconomic groups, ^2ff. 

Stanford-Binct (rce Stanford-Binet 
Intelligence Scale) 

Interest inventories, 344-65, 437 
Kuder Interest Inventories {see Kuder 
Interest Inventories) 
main objective of, 344-46 
Mmneota 136 

364-66 

(SVIB), 346-51 .. -.e-i 107 

Inventories {see Attitude invcnton«; 

Interest inventories; Personality 
inventories) 

IQ {see aha Intelligence tests) 
classical formula for, 149-50 
classifying Stanford*Dinct IQ s. 

76, not. „ 

classifying Wcchsler IQ’s. 1* " * 

182t. . , , . 

comparison of scales of Stan or 
Binct and Wcchsler. 181 


Umet ana wccnsiv*. .v. 
heredity vs. environment, 37-41 
racial and socioeconomic groups ana, 

42/7'. 

standard deviation of Stanford-Dinel, 

149 

«hat it means and what it does not 
mean, 35-36 

IQ tests {see Intelligence tests) 

Jung, Carl, 211 

Jensen, A., 39, 41, 42, 46, 54 

Kuder Imcrest Inventories, 351-60. 
364, 36S ,5, 

General Interest Survey. 
intention of, 358 
meaning of ten intercs 

measured by, 356-57 -mco 

Profile Leaflet, 354 , 255 fig; 
scores, 357-58 /nist 

Occupational Interest Survey (OIS»). 

359,360^5., 365 
Preference Record, 351-3 


208-209 

Mann, Horace, 386, 397 

Manuals, test (tee Test raanu*) 

Mathematical aputode tests, 249 
Mean, 129-31, 133 .ariipt 

Measutemenl 

answer sheet of. 25, 77 

Measures of central tendency, 128-32 

mean, 137-31- '33 
median. 131-32, 133 
mode. 132, '33 na ,.36 

Measures of vanaMny, 133 36 

cotreUtion, 9W7, 135 j 

sundard deviation, 111".. 

145. 147. 149 

Median, '3'-33' 333,j,^ 251 

Meier Ait H^m n 

Mental ap, „,tion of, by 

^'"“‘ivthiTs*. >82-83. 183... 

>83-87 ,59- 

llmUl A7mt.re»a»'- « 

60. 219 

Merrill, Ma»dA_. ,,„„, Tests, 

MetriU-Palmet Scale 

>’3 „ Readiness Test, 

Mettopohta" >'“'“ 3 

246-47 251 

Mdler Analogic ‘ 231.32 

Minnesota personality In- 

Mmuesotah'-hP”^,, 32-39 
,entoty(M3'P'' 337 

compn-'-fS 

format of. 332 

.cores, 333 37 



INDEX 


MMPI (Cont’d) 

“ clinical scales,” 333-37 
Finney-Auvenshine program, 337 
interpretation of, 335-37 
“validity scales,” 334 
Minnesota Preschool Scale, 195 
Minnesota Rate of Manipulation Test, 
233 

Minnesota Teacher Attitude Inventory, 
339 

Minnesota Vocational Interest Inven- 
tory (MVII), 361 profile sheet, 
361, 362-63 fig. 

MMPI {see Minnesota Multiphasic 
Personality Inventory) 

Mode, 132, 133 

Modern Language Aptitude Test, 249- 
50 

Mooney Problem Check List 
four forms of, 330-31 
major uses of, 330 

typical items from junior high school 
form, 331-32 

Motivation, importance of, on test 
scores, 78-80 

MRC answer sheet, 75, 77 fig. 
Multiple-choice tests, 402^08 
areas of effective measurement by, 
403 

criticisms, of, 32-34 
suggestions for development of, 405- 
408 

Musical aptitude tests, 31, 250 
MVII {see Minnesota Vocational In- 
terest Inventory) 

National Council on Measurements 
Used in Education, 62, 63, 

91 

Nebraska Test of Learning Aptitude, 
190 

New York City, elimination of IQ tests 
by, 26 

Normal curve, 133, 133 fig., 134-35 
145, mfig. 

showing standard deviations, 134 J5g, 
Norms, C7n., 136-45 
age norms, 137-38 
educational age, 137-38 


mental age, 138 
grade norms, 138-40 
percentile norms, 140-41, 142L, 143/., 
144-45 

practical usage of, 151-57 
what to look for in test manual, 154, 
157 

Objective test, teacher-made, 381-84, 
397-416 

analyzing test results, 412-13 
“case history” of, 413-16 
characteristics of, 398 
compared to essay test, 383-84 
items, types of 

completion items, 400-401 
matching items, 401-402 
multiple-choice {see Multiple-choice 
tests) 

true-false, 399-400 
mechanical operations of test con* 
struciion, 4D9-12 
assembly, 409 
directions, 410 

editing and arranging, 409-10 
item writing, 409 
physical layout, 411 
reproduction, 411 
scoring, 412 

rules for item writing, 398-99 
tj^es of, 382 

Objectives, instructional, 375-79 
O’Connor Finger and Tweezer Dexter- 
ity Tests, 233-34 

Ohio State University Psychological 

Test, 297 

Percentile equivalents, 134, 134 yig. 
Percentile norms, 140-41, 142^., 143^., 
144-45 

Personality, meanings of, 324-25 
Personality inventories, 324-38, 437 
basic problems in use of, 338-39 
MMPI {see Minnesota Multiphasic 
Personality Inventory) 

Mooney Problem Check List, 330-32 
school and, 337-38 
TDOT {see Thorndike Dimensions of 
Temperament) 


4S8 


INDEX 


Reliability (Cont’d) 

standard error of measurement, 
109-12, 112/. 
factors affecting, 113-15 
ability lc%-el of group, 115 
length of test, 113 
method used, 115 
range of talent, 114, 114/. 
interpretation of, 112-15 
methods used in estimation of 
alternate forms of original test, 
106-107 

retesting wih same test, 105- 
106 

“split-half,” or ‘‘odd-e\‘en,” cor- 
relation, 107-10S 

v.hat to look for in test manual, 116- 
17 

Report cards, 424 {see also Grading) 
Representing validity, 91-92«. 

Response sets, SO 
Rorschach, Hetman, 201 
Rorschach Test, 200-205 
administration of, 201-203 
c%’aluation of, 205 
history' of, 201 
interpretations of, 204-203 
over\'icw, 200-201 
scoring procedures, 203-204 
Rotter Incomplete Sentences Blank, 
211 

SAT {see Scholastic Aptitude Test) 
SCAT {see School and College Ability 
Tests) 

Scatter dugram, 135, 137/. 

Scholastic Aptitude Test (SAT), 13, 
29S, 301-30S.436 
ambiguous t^ueslions in, 33-34 
basic diilcrencc between ACT and, 
296 

cfica of tutoring on. 83-84 
findings of recent studies, 319-20 
preparation for, 83-84. 303 
sample questions from, 303-303 
nuthrmatical scaion, 305-308 
verba! section, 303-305 
scores on, 302 

standard error of measurement for. 
302 


Scholastic aptitude tests, 27, 169, 219 
28, 434, 435 
basic purpose of, 29 
interpretations of results, 219-20 
Kuhlmann-Finch, 224 
Primary' Mental Abilities Test, 221-24 
School and College Ability Tests 
(SCAT), 224, 226-28, 229 fig., 
297 

when to administer, 220 
School ability tests, 30, 31 
School and College Ability Tests 
(SCAT), 224, 226-28, 229 fig., 
297 

School testing program {see also stan- 
dardiaed tests) 

determining objectives of, 430-32 
praoical needs, 431 
work of testing committee, 430- 
41 

guidelines and steps in dev’elopment 
and implementation of, 429-30 
planning and using, 429-60 
recording test scores, 441 
reporting test results to students and 
parents, 457-60 

sample program (jee South Bend 
Community School Corporation) 
scheduling of tests, 437 
selection of tests, 432 
South Bend Community School Cor- 
poration, {see South Bend Com- 
munity School Corporation) 
suggestions for, 432-37 
kindergarten through sixth grade, 
433-35 

se\'enih through twelfth grade, 

435-37 

test orientation, 439-41 
for faculty, 439 
pretest, for students, 440-41 
selection and orientation of test 
administrators, 439-40 
for two high schools, 28-29/. 
when to begin planning, 432 
Science Research Associates, 297 
Scores, test {tee Test scores) 

Scoring procedures 
errors in, 78 

as a factor in selecting test, 118-19 



INDEX 

Scoring procedures (Conl’d) 
machine scoring, 75-78 
of Rorschach Test, 203-204 
in school testing program, 450-51 
scoring stencil, 74-75 
self-scoring answer sheet, 74. 76 
of Thematic Apperception Test. 21)6- 
207 . , , 

Seashore Measures of Musical Talents, 
250 

Sentence completion tests, 211 
Sequental Tests of Educational Progress 
(STEP). 268«. 

mathematics section of, 272, 27ifig‘ 
Shuey, A. M.. 45, 46. 54 
Simon, Theodore, 60. 170 
Society for the Psjchological Study oi 
Social Issues, 47 

South Bend Community School CoiTor- 

ation, testing program of. 4+1 w 

evolution of, 442 
mechanics of testing. W-SO 
program evaluation, 455 
schedule of tests, 452-54/. 
test material purchasing, storing, 
distribution, 451 

test rationale, 445-47 ,3 

test results and their use. ^ 
test score interpretation. . -j 

test scoring and reporting. 450-51 
testing philosophy. 108.115 

SRAA«. Scdc,. ample hem. 

BRA So? V 

from,2/M2 133-35, 145. 

Standard deviation. Ill .. 

147, 149. , 99.100 

Standard 

Standard error of meat 

''"■■^"'•i45'-4S,>50 

146 abo 

inn prostam) 


of achiwement. advantages and lim- 
itations of, 372-73/. 

ACT {see American College Testing 

control 5 distribution and sale of. 63. 
64-65 

differences between tcachcr-made 
achievement tests and. 265-67. 
371-72 

practical approach to use of, a- 
publishers of, 461-63 
reasons for using. 10-16 
academic reality, 12-13 
discovering educational y and so- 
cially maladjusted children. 14 
educational and vocational goals. 

ewlualing capability and accom- 
plishment, 12 

groupine. 1 “-” 

IJSSutmly 

lion, 11-12 
transfer students, 14 

SAT(.rrSchotou=Ap.hud_'J«) 

uhoronorlobo uKd >(.-18 

assigning gt.dcs, 1 > 

evaluating schwl. I'-'* 
teacher etaloat.on, IS 

s:',2?3:ro2,io5.iod.»8. 

116.150.151 

steps^lhnuing eon.truetion of 

sehtSorv-lo^P-^T'S- 36. 

eh.r.c.e.hticaol.l7jl-'S 

Iwl*. *71-^5 _ 
hUtory of. f- 0 . y 

IQ cau-fiones ^ 

standard dcMation of. 14). IM 

Sonineacorcs. 14M5 



490 


INDEX 


Statistical or empirical validity, 92n. 
Statistics, 121-57 

averages (see Measures of central 
tendency) 

class interval, 124—26 
frequency distribution, 124, 126, 130- 
31 

graphic representation, 126-28 

language of, 122-23 
measures of central tendency, 128-32 
measures of variability’, 133—36 
correlation, 96-97, 135-36 
standard deviation, llln., 133-35, 
145, 147, 149 

normal curve, 133, 133 fig., 134-35, 

145 147t. 

showing standard deviations, 134 

fig- 

norm^ {see Norms) 

percentile cqtuvalents, 134 ,134^^. 

ranking, 123 

rationale for, 121-22 

scatter diagram, 135, 137/. 

Strong Vocational Interest Blank (SVIB), 
346-51, 365 

basic purpose of, 346-47 
history of, 346, 350-51 
men’s form, 347-48 
profile form, 349 jig., 350 
sample items from, 347 fig. 
scores, 348, 350 
women’s form, 348 

Sur\’ey achievement tests {see Achieve- 
ment tests: survey achievement 
tests) 

Survey of Study Habits and Attitudes 
(SSHA), 339-40 diagnostic pro- 
file for, 342^^. 

SVIB {see Strong Vocational Interest 
Blank) 

Symonds Picture-Story test, 210 
Taxonomy, 375-76 

Ta.xonomy of Educational Objectives, 375, 
376 

TDOT {see Thorndike Dimensions of 
Temperament) 

Teacher-made tests, 5-6, 371—416 {see 
also Achie\’ement tests) 


of achievement, advantages and limi- 
tations of, 

differences between standardized 
achievement tests and, 265-267, 
371-72 

qualities of a good test, 380-81 
reasons for giving, 5-6, 372-74 
certification of pupil achievement, 

6 

incentive, 6 

measuring outcomes of instruction, 

6 ^ . 
steps in construction and admini- 
stration, 380 
ty’pes of, 381-84 

essay {see Essay test, teacher- 
made) 

objective {see Objective test, 
teacher-made) 

Terman, Lewis M., 61, 170 
Test administrator, duties of, 68-71 
Test evaluation, 90-120 
practicality’ in, 117-19 
reliability {see Reliability, in evaluat- 
ing a test) 

validity {see Validity, in evaluating a 
test) 

Test information, sources of, 158-66 
journals, 164-65 

Mental Measurements Yearbooks, 159 
60 

Standards for Educational and Psycho- 
logical Tests and Manuals, 6 , 
bbff., 92-93, 102, 105, 106, 113, 
116, 150, 151 

test publishers, 162-64, 461-63 
T«/j in Print, 160 

texts and reference books, 160- 
62 

Test manuals 

«>ntent of, 67-68 
standards for, 66-68 
what to look for 

construct validity, 103 
content validity, 93 
criterion-related validity, 100 
essential normative data, 154, 157 
reliability’, 116-17 
writing of, 262 



index 

Test scores (see also Grading; Standard 
scores) 

American College Testing Program 
(ACT). 316-18 

College Board Achievement Tests 
309-14 

differences of, between racial and 
socioeconomic groups, 41-52 
factors affecting, 78-86 
anxiety, SO-82 
motivation, 78-80 
practice and coaching, 82-85 
of Kuder General Inter«t Survey, 
357-58 

proper interpretation of, 63-64. 
451 

raio and eonveried, 7Zn. 
recording of, in school testing pro- 
gram, 441 

reporting of, to students and parents. 
456-58 

Scholastic Aptitude Test, 302 
Strong Vocational Interest Blank, 
348 350 

of Wechsler Scales, 181-82 
Test security, 64-65 
Testing 

contemporary Issues and problems, 
20-55 ' 

criticisms of, 59 (see also Criticisms of 
testing) 

problem areas (see Problem areas of 
testing) 

reasons for, 3-19 

standarized tests, 6-16 (see also 
Standardized tests) 
feacher-mao'e rests; 572-7^ 
Teacher-made tests) 

Testing program (see School testing 
program) 

Tests (fee also Psychological tests; 
School testing program) 
achievement (see Achievement test 
batteries; Achievement tests) 

ACT (see American College Testing 
Program) « . , • i 

administration of (see Psychologfcal 
tests; administration of) 
aptitude (see Aptitude tests) 


491 

attitude inventories (f« Altitude 
inventories) 

college entrance examinations (see 
College Entrance examinations) 
culture-fair, 31, 190, 192-93 
group standardized testing, 219-367 
(fee alio Group standardized 
testing; Standardized Tests) 
infant and preschool tests (tee Infant 
and preschool tests) 
intelligence (see Individual tests of 
intelligence; Intelligence tests) 
Interest inventories (see Interest in- 
ventories) 

IQ (f« Intelligence tests) 
list of representative tests, 471-78 
moJliple-choice (see Mujtiple-choire 
tests) 

music and art aptitude, 31 
objective (»« Objective test, teacher- 
made) 

personality, 31 (see also Projective 
techniques) 

projcctjve techniques (set Projective 
techniques) 

psychological (see Psychological tests) 
reading (tee Reading tests) 

SAT (tee Scholastic Aptitude Test) 
scholastic aptitude tests (tee Scholastic 
aptitude tests) 
school ability, 30, 31 
scoring of (see Psychological tests: 
scoring procedures; Scoring pro- 
cedures) 

standardized (tee Standardized tests) 
teacher-made (ree Teacher-made tests) 
Aw-^&Av, 

vocational aptitude (see Vocational 
aptitude tests) 

Tests its Print, 1 60, 

TTicmatic Apperception Test (TAT), 
205-20S 

administration of, 205-206 
csiluatton of, 207-20S 
scoring of, 206-207 

Thorndike Dimensions of Temperament 
(TDOT), 325-30 

description of dimensions, 326- 
37r. 



492 


INDEX 


TDOT (Cont'd) 
sample set of items from, 328 jig. 
summar}' of findings, 329 
Time limits of tests, 70-71 
True-false tests, 399-400 

Yaliditv, irt e\'aluating a test, 67n., 91- 
'l04 

basic t)’pes of 

concurrent \’alidity, 91n. 
construct validity, 101-103 
content validity, 92-93 
criterion-related solidity, 94-100 
predictive validit)’, 91n. 
coefficient of validity, 99-100 
standard error of estimate, 99-100 
Variability, measures of (jee Measures of 
variability) 

Vocational aptitude tests, 31 
clerical aptitudes, 23 1-232 
mechanical aptitudes, 232-35 
prognostic tests, 245-51 
artistic aptitude, 251 
foreign language aptitude, 249-50 
mathematical aptitude, 249 
musical aptitude, 250 
reading readiness, 245-49 
test batteries concerned u-ith voca- 
tional prediction, 235-45 
Wechsler, Darid, 176^. 


Wechsler Scales, 146 Jig,, 150, 176- 
87 

diagnostic and clinical features of, 
JS2-86 

e^’aluation of, 187-88 
history of, 176-78 
meaning of scores, 181-82 

Compared with Stanford-Binct, 181 
samples of subtests and thetr content, 
178-81 

standard deviation of, ISl 
used to ascertain mental retardation, 
182-83, 183/. 

WAIS (Adult Intelligence Scale), 
177-82, 187 
tests in, 177 

Wise (Intelligence Scale for Child- 
ren), 177-82, 187 
tests in, 177-78 

WPPSI (Preschool and Primary’ Scale 
of Intelligence), 178, 187, 195 
tests in, 178 

Wechsler-Belle\'ue Scale, 177, 177n. 

Word association tests, 211 

Wundt, Wilhelm, 7 

"Your Aptitudes as Measured by the 
Differential Aptitude Tests,” 
excerpts from, Z40-43 



